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ABSTRACT 


Mobasseri, Bijan Gholamreza. Ph.D. , Purdue University, 

May 1978. A Parametric Multiclass Bayes Error Estimator 
for the Multispectral Scanner Spatial Model Performance 
Evaluation. Major Professor: C. D. McGillem. 

Efficient acquisition and utilization of remotely 
sensed data requires an extensive apriori evaluation of the 
performance of the basic data collection unit, the multi- 
spectral scanner. The objective is the development of a 
fully parametric technique to theoretically evaluate the 
systems response in any desired operational environment 
and provide the necessary information in selecting a set 
of optimum parameters. 

The probability of correct classification of the 
various populations in the data is defined as the primary 
performance index. The multispectral data being of 
multiclass nature as well, requires a Bayes error estima- 
tion procedure that is dependent on a set of class statis- 
tics alone. The underlying problem facing the development 
of such technique is discussed and a solution based upon 
sampling of the feature space is proposed. \ The classifi- 
cation error estimator is expressed in terms of an N 
dimensional integral where K is the dimensionality of the 
feature space. A set of successive linear transformations 


xxi 

prior to the error estimation process provides an N to 1 
dimensionality reduction by reducing the Bayes error 
estimate to a product of N one dimensional integrals. 

The statistical properties of the estimate is formulated 
and its relationship with the geometry of the decision 
boundaries discussed. 

The multispectral scanner spatial model is represented 
by a linear shift-invariant multiple-port system where the 
N spectral bands comprise the input processes. The scanner 
characteristic function, the relationship governing the 
transformation of the input spatial and hence spectral 
correlation matrices through the systems, is developed. 
Specific cases for Gaussian and rectangular point spread 
functions are examined. Random noise is considered and 
its interpretation in the context of multispectral data 
is discussed. 

In order to validate the Bayes error estimation 
algorithm’s proper performance, multivariate normal data 
is simulated and the classification accuracy of a set of 
test cases determined by the parametric and Monte-Carlo 
type methods. The comparisons of the results provides the 
required information for evaluation of the theoretical 
Bayes error estimtor performance. 

The integration of the scanner spatial model and the 
parameter classification error estimates provides the 
necessary technique to evaluate the performance of a 


xxii 


multispectral scanner. A set of test statistics are speci- 
fied and the corresponding output quantities computed by 
the characteristic function. Two sets of classification 
accuracies, one at the input and one at the output is esti- 
mated. The scanner* s instantaneous field of view is changed 
and the variation of the output classification performance 
monitored. The same procedure is followed with additive 
noise at the scanner output. 

In conclusion on the basis of these theoretical results 
the interaction between the classification accuracy, 
signal-to-noise ratio, spatial resolution, data spatial 
correlation and scanner aperture is explained and some 
suggestions regarding the selection of optimum system 
parameters is presented. 
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CHAPTER 1 
Introduction 

The utilization of earth orbiting platforms as a means 
of environmental data acquisition has undergone a tremendous 
growth in the past decade. The feasibility of such tech- 
niques was first demonstrated using a multispectral scanner 
(MSS) carried in a low flying aircraft. The launching of the 
Earth Resources Technology Satellite (ERTS) , later re- 
named the Landsat, greatly increased the scope of remote 
sensing technology [1]. Positioned in a polar orbit with a 
repetitive coverage period of 18 days, a variety of agricul- 
tural and environmental data are collected and telemetered 
to the ground for processing. On board, a rotating mirror 
multispectral scanner operating in four nonoverlaping bands 
of electromagnetic radiation constitute the main component 
of the data collection system [2]. 

The electromagnetic energy reflected by a target is de- 
composed into four spectral bands and then transmitted to 
earth through a PCM channel. The signal degradations caused 
by various transformations within the scanner subsystem are 
of great importance. The finite scanner aperture and the 
atmospheric and quantization noise are but some of the con- 
tributing factors. The optimization of the entire set of 
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interactive parameters within the scanner can be quite in- 
volved. From an information processing view, however, five 
major categories emerge. 

1. Spectral band location in the electromagnetic spec- 
trum 

2 . Spectral bandwidth 

3. Number of spectral bands 

4 . Spatial resolution 

5 . Signal-to-noise ratio 

Due to the finite capabilities of scanner and data 
analysis techniques, the continuum of the electromagnetic 
spectrum cannot be fully utilized. Therefore, sampling of the 
spectrum becomes essential. The band location is generally 
determined by the target spectral characteristics such that 
different cover types exhibit different spectral signatures 
in the same band. The wavelength limits can be shifted some- 
what to improve crop identification, but the spectral band- 
width cannot be decreased very much for a fixed signal-to- 
noise (SNR) ratio. The SNR decreases with decreasing bandwidth. 

The spatial resolution has a direct relationship with 
the signal-to-noise ratio and the classification accuracy. 

An increase in the resolution requires a narrower aperture 
which in turn leads to decreased SNR, reduced classification 
performance and a smaller area scanned for the same data rate. 
For a coarse resolution, the scanner aperture is wider, SNR 
is higher classification error rate in general increases, but 
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'mixed pixels', due to averaging of the adjacent field pixels, 
will arise. 

In a multispectral , remotely sensed data gathering system 
the final and most important result is the association of 
each resolution element on the ground with a previously de- 
termined population and the evaluation of the performance of 
such a classification operation. The selection of the par- 
ameters • within a scanner has as a primary aim the minimiza- 
tion of the probability of misclassif ication (PMC) of the 
data. Thus, the classification performance is an indicator 
against which the choice of other system parameters can be 
compared. 


1.1 Statement of The problem and a Desired 
Operational Framework 

The reflected energy from agricultural and other cover 
types of interest is corrupted by various noise sources, re- 
shaped by the finite scanner point spread function (PSF) and 
then quantized and transmitted back to the ground stations 
for processing. At the Purdue University Laboratory for 
Applications of Remote Sensing (LARS) , the remotely sensed 
data is analyzed by classifying it into one of M populations 
by an optimum (minimum probability of error) Bayes classifier. 
We define the resulting probability of correct classification 
as the index of performance for a multispectral scanner and 
describe the goal as the evaluation and simulation of an 
optimum multispectral scanner system within the framework of 
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interactive relationships between the spatial resolution, 
signal-to-noise ratio and classification error rate. Implicit 
in this statement is the fact that the spectral band location 
and bandwidth have already been optimally selected as part 
of the system design process. 

The classification accuracy obtained by processing the 
actual data is necessarily suboptimum due to the aforemen- 
tioned degradation sources. A reference PMC could* be defined 
by analyzing the performance using the reflected signal at 
the scanner input, even though this signal is obviously in- 
accessible. By simulating a theoretical model for the MSS, 
however, the classification error rate can be evaluated and 
compared at the scanner input and output thereby establish- 
ing an upper bound on the system performance in the context 
of the defined index of performance. Arbitrary spatial reso- 
lution can be specified and its interactive relationship with 
the SNR and PMC studied. 

This interrelationship has been investigated before and 
some general trends are known. In one experiment [3], initial 
high resolution aircraft data was classified and the cor- 
responding classification accuracy determined. Lower resolu- 
tion scanner PSF's were then specified and convolved with the 
aircraft scanner data to generate a coarser resolution data 
base. Transformations were carried out for different PSF's 
and it was concluded that the corresponding PMC's were a de- 

■i 

creasing function of the spatial resolution. 


5 


The technique employed in [3] is inherently empirical 
due to the utilization of actual data in the simulation pro- 
cess. Two potential shortcomings of this procedure can be 
cited: (a). The multispectral signal used is already cor- 

rupted by the degradation sources and their effects cannot 
be isolated; and (b) , The accuracy of the system performance 
calculations is dependent on the size of the available data 
set. In many applications the data availability can be limit- 
ed due to the cost, ease of acquisition, availability of 
ground truth etc. In particular, by convolving an initial 
data base with a cascade of scanner PSF's to generate a low 
resolution set, the averaging property of the convolution 
causes successive reductions in the numerical size of the 
convolved data and directly affects the statistics and the 
corresponding estimate of the classification accuracy. 

The need for a different algorithm to simulate a multi- 
spectral scanner and evaluate its theoretical performance 
has been demonstrated. This method in order to be as flex- 
ible as possible, should depend entirely on the parameters 
of the model; i.e., population statistics, scanner PSF, noise 
level etc. Fig. 1-1 is a basic block diagram of the desired 
MSS model and the performance evaluation process. X is the 
multispectral feature vector. The scanner model is a linear 
system with specified PSF. The statistical description of the 
scene is computed at both the scanner input and output, 
f (X) ,f ' (X) . The corresponding probabilities of correct clas- 
sification are provided by the classification error estimator. 
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A A 

P and P'. This realization of the process is highly parame- 
c c 

trie and displays minimum dependence on X. For a given geo- 
graphical scene, the scanner PSF and additive noise can be 

A 

varied and the resulting interaction with P' observed. Each 
of the blocks in Fig. 1-1 is composed of various subsystems 
which will be considered in more detail in later chapters. 

The projected algorithm will have several capabilities. 

The most important one is the ease of parameter manipulation. 
Variation of the scanner spatial resolution will cause the 
output statistics to be modified with a corresponding var- 
iation in the estimate of the classification error. Similarly, 
variations in the population separability at the scanner in- 
put and the resulting interaction with the PMC can be studied. 

This built-in flexibility is a desirable and almost im- 
perative feature of the scanner system modeling. A specific 
example is the class statistics manipulation. The generation 
of a new data set, with prescribed statistics, from the exist- 
ing data set, requires appropriate software and, depending on 
the data base magnitude, can be potentially time consuming. 

The alternative in the proposed algorithm is to supply the 
statistics alone. 

The following comment is in order here. Much emphasis 
has been placed on the data- independent feature of the al- 
gorithm. It is clear that this requirement can only be car- 
ried so far. Whatever the method, the population statistics 
must be specified. This condition can be satisfactorily 
met only by access to an available data set, quality not- 
withstanding. The distinction emerges at this point that 
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the contribution of the data to the final result ends at this 
stage for a parametric model whereas the data utilisation 
will continue throughout the model for an empirical scheme 
with an error compounding effect. 

The MSS simulated model, being a linear system, lends 
itself to well established system theory methods and, de- 
pending on the functions involved, closed form relationships 
relate the scanner input and output statistics. For the 
block diagram of Fig. 1-1 to be operational, the contents of 
the classification error estimator element must be specified. 
The input to the base is the set of population statistics in 
the form of M mean vectors and covariance matrices and 

the output is a set of M performance indices, i.e., the pro- 
babilities of correct classification. 

The parametric Bayes error estimator is developed in 
chapter 2. The resulting algorithm requires the data 
spectral covariance matrices as the only input and produces 
a set of probabilities of correct classification for each 
population. In chapter 3 the MSS and multispectral data 
spatial model is discussed and the desired spectral transfer 
functions obtained. The experimental results in the form 
of validation of the classification error estimator using a 
set of test cases is covered in chapter 4. The scanner 
spatial model is evaluated in chapter 5 and the associated 
relationship between the MSS spatial parameters , scene 
correlation, noise and classification accuracies are discussed. 


A summary and suggestions for further work is the topic 
of chapter 6 . 


;e 



Fig. 1-1 The Basic Block Diagram of the MSS Model. 




CHAPTER 2 


Parametric Bayes Error Estimator in a 
Multiclass Multidimensional Environment 

There are basically two types of data classification 
methods available; parametric; and nonpar ametric . Non- 
parametric classification, such as nearest neighbor, is in- 
dependent of the statistical description of the data, requires 
access to a large data base and generally is suboptimal rela- 
tive to the Bayes classifier. It has been shown that the 
multispectral scanner data can be acceptably described by 
Gaussian statistics [43- Therefore, resorting to nonpara- 
metric classification would discard valuable a priori know- 
ledge that can improve performance. 

Parametric classification, requires the statistical de- 
scription of the data, either exactly or by parameter estima- 
tion. Among all parametric classifiers, Bayes or maximum 
likelihood {ML) classifiers are optimum in the minimum probabili 
ty of error sense. Although classification of any data set, 
parametric or otherwise, is fairly straightforward, determin- 
ation of the performance of the classifier is far from straight 
forward. The complexity of the problem is primarily a function 
of the dimensionality of the measurement space and, to a les- 
ser degree, a function of the multiplicity of populations. 
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Unless the measurement space is limited to a single dimension, 
an assumption of very limited applications, exact error rates 
are not known for ML classifiers. 

Multispectral data seldom contains only two classes and 
always is of a multidimensional nature. The performance 
calculation for this case has been essentially of a Monte- 
Carlo nature. The classifier is trained on a portion of the 
data and then tested on either the same portion or a different 
segment. The estimate of the probability of error is defined 
as 

A M n. 

e = l P(u.)^ (2-1) 

i-1 1 w t 

where M, Pfu^), n^ and are the number of populations, a 
priori class probability, misclassified samples from class 
61 ^ and the total number of available samples, respectively. 

A 

e is an asymptotically unbiased and consistent estimate of 
the PMC [5]. Eq. (2-1), with various modifications, is 
practically the only available PMC estimator. The majority 
of the literature on statistical classification has been de- 
voted to the case of two multivariate normal populations with 
heavy emphasis on the equal covariance matrices assumption. 

2.1 Review of Previous Work 
The field of classification and discrimination, other- 
wise referred to as allocation, identification, pattern recog- 
nition and pattern selection has been one of the most in- 
tensely researched areas of statistics and has attracted con- 
tributions from a variety of disciplines. In a bibliography. 
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Anderson et al [6] list over 400 papers published before j 

1967. . ] 

' 1 

In the beginning stages of research (prior to 1930) , ] 

the classification problem did not have a precise definition, I 

i 

and was often considered in the context of testing the equality j 

of two distributions [ 7 ] . The first clear formulation of the 

problem is attributed to the pioneering work of Fisher whose i j 

■ . i i 

ideas were first exposed in the works of other people [8], 1 = 

In his first paper [9], Fisher considered classification as 
a problem in multiple taxonomy. For univariate, two popu- [ 

i . 

lation cases he suggested a rule that would assign the measure- j' 

_ j . 

ment X to if |x-3L | was the smallest of |x-X^[ and Ix-^l ; 

• 

a nearest neighbor rule in current terminology. When measure- ! 

ments were multidimensional, Fisher reduced the problem to j 

the univariate case by selecting a linear combination of the j 

' I ■ 

measurements, Fisher’s linear discriminant function (LDF) , ! 

the parameters of which were selected so as to minimize the j' 

ratio of the within class scatter to the between class scat- j 

• • . i 

ter. He called this the optimum linear combination. j 

j 

One of the most significant developments occured with ; : 

the fundamental results of Neyman and Pearson [10]. This was ; 

followed by the formulation of the Bayes rule and minimax 
Bayes rule for two populations and known statistics by Welch 
[11]- Wald [12] considered the same problem and suggested 
replacing the unknown quantities with their maximum likelihood 
estimates. Von Mises [13] obtained a rule that would maximize i 
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the minimum probability of correct classification when an 
observation is to be assigned to one of several populations. 

In a literature survey of a field as diverse as sta- 
tistical classification, one necessarily has to focus on the 
particular aspects of the subject most relevant to his work. 
Therefore, two broad topics; binary group classification and 
multiple group classification under the assumption of multi- 
variate normal (MVN) statistics are surveyed. 


2.1.1 Classification Into Two MVN Distributions With 
Equal Covariance Matrices 

Let the distribution of X in be N(]j^,E) 1=1,2, where 

ja^ and £ are assumed to be known. This arrangement comprises 

a classical case for which precise error expressions exist. 

2 T —1 

Let A - ) L ke Mahalanobis distance, then 


e i 

where O(a) is defined as 


= Q(-|) 


Q ( 


a) = -L- f 
/2ir ' a 


dx 


( 2 - 2 ) 


(2-3) 


This case has been discussed by, among others, Welch 
[11], Wald [12] and Rao [14]. The distribution of classifi- 
cation statistics, if known, can directly provide the error 
probability. Anderson [15] proposed his W classification sta 

S 

tistics by substituting the ML estimates of the unknown para- 
meters in a general likelihood ratio rule (plug-in LR) . The 
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distribution of W proves to be quite complicated to the point 
of being impractical. Bowker [16] showed that W can be repre- 
sented as a function of two independent 2x2 Wishart matrices 
one of which is noncentral. Bowker and Sitgreaves [17] used 
this result to find the asymptotic expansion of the W dis- 
tribution function in terms of Hermite polynomials. Teichroew 
and Sitgreaves [18] used an empirical sampling technique to 
estimate its distribution. Okamoto [19] considered the sta- 
tistics of W where the number of degrees of freedom r of S, 
the sample pooled covariance matrix, is not necessarily 
n l +n 2~ 2 ' w ^ ere n i an< 3 n 2 are t ^ ie f an 3om sample sizes from 
and ^2 * He then obtained an asymptotic expansion for 

P[{W-A 2 /2)/A < k/Tr^] (2-4) 

in terms of n^, n 2 and r as n^ and n 2 tend to ~ and n ^/ n 2 
tends to a constant. John [20,21] obtained the distribution 
of the statistics of W when the common covariance matrix is 
known. 

When the class statistics are based on samples, T = 
T(X,jJ^,y 2 ,Z) is a decision rule whose plug-in estimate, T, 
is obtained by substituting the corresponding sample estimates 
for Ui'U 2 r an< * 1' ^hen the conditional error porbability 
based on T is given by 

e^(T) = P[T classifies X into WjjX^X^ S, u = y^J (2-5) 

i r 3 “ 1 r 2 

i ^ j 


1 


15 

A A 

The unconditional error probabilities of T are a^ (T) = 

A A A A 

E[e. (T) ] . Denote the estimate of e. (T) by e. (T) where the 

1 X I 

unknown parameters have been replaced by their respective 

A A 

ML estimates, a. (T) is defined similarly. Let T be the 

1 o 

A 

minimax rule with known parameters and T q as its plug-in 

A 

version. John [ 223 obtained the distribution of e^(T Q ) when 
l is known. Dunn and Varady [23] using an empirical Monte- 

A A 

Carlo technique considered 1 - a^(T o ), 1 - e^(T o ) anc ^ 

A A A 

1 - e,. (T q ) and derived a confidence interval for e^(T Q ). 
Lachenbruch [ 24] introduced his leaving-one-out method and ob- 

A 

tained an almost unbiased estimate for e^{T Q ). Hills [25] 

A 

showed that when n. = n- a. (T ) > a. (T,) . Lachenbruch and 

1 2 l o i o 

Mickey [26] compared seven estimation techniques by Monte- 
Carlo type simulation and concluded that the two most common 
methods, resubstituting the training samples for testing and 
the plug-in version of Mahalanobis distance, perform relative- 
ly worse than others. Glick [27] showed that as n^,^-’- 00 , 

A 

a . (T) -^-a . (T) a.s. uniformly in the class of all rules? more- 
X x 

A A A 

over, if T is a LR rule, a^ (T) -J-a^ (T) a.s. and a^ (T) -*-a^ (T) . 


2.1.2 Classification Into Two MVN Populations When 
Covariance Matrices Are Unequal. 

This case differs from the equal covariance matrices case 
due to the quadratic form of the discriminant function (T 
being a Bayes rule) . Let the distribution of X in uk be 
N ^i'^i^ ^ ^ 2 * Classification statistics again have 

been a point of interest. Assuming that all the relevant 
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parameters are known. Cooper [28] studied the optimality of 
the quadratic discriminant function under stochastic regimes 
other than normal. When covariance matrices are proportional, 
Han [29] obtained the distribution of the likelihood ratio 
and extended the result to circular matrices [30], 

Gilbert [31] considered the effect of inequality of the 
cov^’-iance matrices on Fisher’s linear discriminant function 
and concluded that when = d^ the performance of Fisher's 
LDF is adequate only for small values of d. Using simula- 
tion techniques, Chaadha and Marcus [32] compared the be- 
havior of three distance statistics and stated that Mahalanobis 
2 

A and Anderson-Bahadur distance are similar in performance 
and superior to Reyment’s generalized distance. Fukunaga 
and Krile [33], using the distribution of the quadratic dis- 
criminant function, expressed the probability of error as an 
integral and applied the technique to data reported previously 
[34]. 

2.1.3 Classification Into Multiple MVN Populations 

The problem of optimally classifying an observation into 
one of M populations under the assumption of general means 
and covariance matrices and obtaining the error rates has re- 
ceived little attention compared to the previous cases. The 
reasons are severalfold. Derivation of the classification 
statistics, so popular in some restricted cases, comes to a 
halt when faced with the requirement of a joint distribution 
of M quadratic forms. The solution, if not outright impossible. 
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is certainly of dubious practical value. Therefore, the as- 
sumption of equality of the covariance matrices, accompanied 
by linearization of the discriminant functions, is widespread. 
Cacoullos [35] considered the case when the distribution of 
X in is N{y^,J) i=l,...,M and assigned X to the closest 
in the Mahalanobis distance sense. Lachenbruch [ 36] compared 
the ML rule with Fisher's LDF, the parameters of which are 
the eigenvalues of a certain matrix. He concluded that when 
the means are arranged in a simplex, the ML rule performs 
much better than the LDF and only when the means are coll inear 
is Fisher's LDF performance comparable to the ML method. 

In general, multiple group classification is comparatively 
unexplored, the corresponding error expressions particularly 
so. In order to make the mathematics tractable, simplifying 
assumptions have generally been invoked. The assumption of 
equal covariance matrices reduces the dimensionality of the 
problem by linearizing the decision boundaries. Hence, 
an otherwise quadratically partitioned feature space is now 
divided by hyperplanes. In many cases this can lead to exact 
error expressions. However, the practicality and usefulness 
of this procedure is open to question. When the multiple group 
classification problem is detection of known signals embeded 
in Gaussian noise, the covariance matrices are indeed equal. 

In other applications such as classification of various agri- 
cultural cover types, however, such an assumption is groundless 
due to the stochastic nature of the signal itself. The 


18 


available error estimation techniques are generally of the 
empirical Monte-Carlo type. 

In addition to the references cited, there are various 
review articles and bibliographies on classification error 
estimation. Some of the most comprehensive ones are by 
Anderson et al [6], Subrahmanian [37], Cacoullos and Styan 
[ 38 ], Lachenbruch [33] and Toussaint [40], 

2.2 The PMC as a Multiple Integral 

The classification of a multidemens ional observation 
vector into one of M populations is conceptually identical to 
the binary case. Let ft, M and N be the feature space, number 
of classes and the dimensionality of ft, respectively. The 
procedure is to divide SJ into M mutually disjoint sets, I\ , 
and to assign each feature vector to a set in accordance with 
an appropriate rule. This is illustrated in Pig. 2-1. 

The estimation of the classification accuracy using the 
Monte-Carlo technique is possible but frequently undesirable 
because of accuracy and repeatability limitations and the data 
dependent nature of the calculation. Therefore, an analytical 
formulation of the error estimation is sought. Let Z^, 
i=l,2,...,M partition ft in R W . The Bayes risk is defined as 
[413 

M f M 

R=I J P(o).)C..f(x|u.)dX (2-6) 

i=l J Z ± j=l 3 ±J 3 

where C. , is the cost of deciding to. where w. is true. In 

* J *1 
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the case where C. . = 0 for i=j and C. . - 1 for i ^ j R is 

1 J J- J 

the probability of error. 

Among all of the possible choices of Z^, the Baye rule 
partitions Si into Z = Z* such “hat R = R* is the minimum 
probability of error [41j. Assuming that the population sta- 
tistics follow multivariate normal law, the optimum Bayes rule 
is as follows [42] 

Xew. if W. < W. V. ^ j i=l,2,....M 

3L 1 1 

where 

W ± = (X-U i ) T ri(X-U i )+ln|I i HnP(oi i ) 

with X = observation vector 

y . = mean vector for class to. 

“ ( 2 - 7 ) 

V. = covariance matrix for class to. 

l 

P(to^) = a priori probability for class to^ 

The error estimate based on direct evaluation of (2-6) ex- 
hibits all the desired properties outlined previously. 

The evaluation of multiple integrals bears little re- 
semblance to their one dimensional counterparts, mainly due 
to the vastly different domains of integration. Whereas there 
are three distinct regions in one dimension; finite, singly 
infinite, and doubly infinite; in an N dimensional space there 
can be potentially an infinite variation of domains. Thus, 
the established one dimensional integration techniques do not, 
in general, carry over to an N dimensional space. Therefore, 
it is not surprising that no systematic technique exists for 
the evaluation of multivariate integrals. The available 
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methods are generally applicable to elementary regions and 
integrals. 

Let us examine the domains of integration encountered in 
the Bayes error estimation. The regions of integration, T. , 
are defined by the inequality < Wj V j 7 s i. Therefore, I\ 
is defined by a set of intersecting hyperquadratics, the 
mathematical representation of which is too complicated to 
be practical. The population statistics, of course, determine 
the geometrical shape of a boundary. The most tractable geo- 
metry results from the assumption of identical covariance 
matrices, ®= ][ V\. An orthonormal transformation reduces 
£ to an identity matrix? hence, each discriminant function W 
defines a hypersphere centered at the population mean in the 
transformed coordinate system. Such an arrangement leads to 
hyperplanes as optimum partitions of the feature space. 

The assumption of equal covariance matrices, albeit 
unrealistic, is prevalent in the statistical classification 
literature and has its roots in the linear property of the 
boundaries. Fig. 2-2 shows a case of four populations with 
two features . 

In approximating the solution to any multiple integral, 
the parameters to be determined are a set of weighting factors, 
w,,w 9 ,...w and a set of points, p_,p 9 ,...,p in Z. Then 
(2-6) can be represented as a finite Riemann sum 

f m 

f (X)dV = l W.f(p.) 

Z i=l 1 1 


( 2 - 8 ) 
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In order to illustrate the difficulties involved with evalua- 
ting (2-8) , an examination of its one dimensional counterpart 
b 

f m 

k(x) f (x) dx * l w f(p. ) (2-9) 

J x=JL 1 1 


where k (x) is a weighting function, is useful. One way to 

evaluate (2-9) is to pre-select p^ according to a certain rule 

and require that w, . . .w be chosen such that 

1 in 

b 

r m 

e = k(x)f(x)dx - £ w.f(p.) (2-10) 

{ i=l 1 1 


is zero for all monomials of degree n. The Newton-Cotes in- 
tegration technique is a prime example of this rule where the 
interval (a,b) is divided into m equal subintervals of length 
(b-a)/m. Among other well known methods having this property 
are the trapezoidal and Simpson's rule. 

Sometimes, it is advantageous to have a set of points with 
unequal spacing. The most common choice is when p_ , . . . ,p 
are the m zeros of an orthogonal polynomial P^tx) • There are 
numerous methods each using a particular set of polynomials to 
generate the desired abscissas [43], among them are the Cheby- 
shev orthogonal polynomials of the first and second kind. This 
approach provides a relation similar to (2-9) except that the 
rule is exact for all polynomials of degree 2m-l. A notable 
example is the m-point integration rule of the Gauss Type. 

The extension of one dimensional techniques to higher 
dimensional spaces is hindered for a variety of reasons. 

As pointed out previously, orthogonal polynomials play an 
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important part in the evaluation of one dimensional integrals; 

however, there is no generalization of such method to higher 

-dimensions. For example, given m points p,,...p in R it 

"1 xn 

may not be possible to find a polynomial, in x and y, to take 
on prescribed values at the points p^. The next item is the 
more complicated structure of the N-dimensional functions 
which necessarily causes complicated domains of integration 
such as (2-7) . 


2.2.1 Decision Boundaries 

In a series of papers. Cooper explored various decision 
boundaries arising in a pattern classification problem, with 
the emphasis on the optimality of some well known rules un- 
der more general conditions. Hyperspheres arising from spher- 
ically symmetric distributions were found to be optimum for 
Pearson Type II and Type VII in addition to a normal distri- 
bution [44,45]. Error expressions were obtained by integration 
of the random measurement vector |x| within the constant radius 
sphere. For the more general case, quadratic partitions were 
claimed to be optimum for not only normal population but for 
the general class of monotone distributions with equal de- 
terminant covariance matrices [46] . in the latter case, the 
statistics, not the functional form of the class density func- 
tions, are the only required parameters. 

Although multiclass , multifeature data classification 
is straightforward, the probability of error estimation 
through non-Monte Carlo techniques shows only a structural 
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similarity to the binary, unidimensional case. It is signifi- 
cant that the complexity of the classification accuracy esti- 
mation in the general case under study is mainly a function 
of the dimensionality of the feature space and only partly a 
function of the population multiplicity. An exact error ex- 
pression, for example, exists for M-class, single dimension 
Bayesian classification. 

The integral expressing the error is an Nth order multiple 
integral over domains defined by (2-7) . Consider Fig. 2-2. 

The region of interest which would yield the highest proba- 
bility of correct classification for class is a triangle. 

Let 



then T 1 is a set defined by the following simultaneous in- 
equalities 


r r 


X 2 < m/2 
Xj > -(x^+m) 


x 2 > x^-m 


Hence 


3m 

2 m 
2 


^cl^ = j j f( -i a3 1 )d? 1 “ 2 j | f (x 1 ,x 2 |to)dx 1 dx 2 


0 x^m 


( 2 - 11 ) 


Therefore, P c | w can be evaluated to any degree of precision 
desired. The point of this simple example was to demonstrate 
the importance of the boundaries of I\ . The ease of formulation 
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was mainly due to the linear ‘contours of integration, pre- 
cipitated by the equal covariance matrices. 

Relaxing the equal covariance matrices assumption con- 
siderably complicates the problem. First, there is no trans- 
formation, unitary or otherwise, that would decouple the 
feature space for all the populations simultaneously; and 
second, the boundaries of interest are now portions of various 
hyperquadratics. These two changes alone would rule out any 

A 

meaningful representation of P c in a form similar to (2-11) . 

Fig. 2-3 shows a typical multiclass case. 

The dimensionality of feature space can be regarded as 
the most important complicating factor. There are at least 
three parameters dependent on N. 

1. order of the error integral 

2. geometry of r^'s 

3. computation time 

The existence (or lack of it) of techniques in evaluation of 
multiple integrals has been discussed before. While it could 
be argued that the multiplicity of populations, M, has a 
more pronounced affect on the decision regions, it is undoubtedly 
true that there are no complex boundaries in one dimension 
regardless of the value of M. In addition, boundary visuali- 
zation, so helpful in error estimation, will no longer be pos- 
sible for N>3. It will be shown in later chapters that the 
computation time is related exponentially to N and linearly 


to M 
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In this section, the difficulties associated with a 
direct evaluation of the multidimensional classification error 
integral were discussed. The parameters of the problem, in 
order of decreasing contribution to the problem complexity, 
are listed below 

1. Inequality of covariance matrices 

2. Dimensionality of feature space 

3. Multiplicity of populations 

2. 3 Approximation to the Classification Error Integral 

In the previous section the Bayes error was expressed 
as a multiple integral over R N , the N dimensional Cartesian 
coordinate system of the feature space. The underlying dif- 
ficulties in evaluation of (2-6) were attributed to the in- 
tractable mathematical description of the contours of I\ , 
and the N-th order multiple integral over an arbitrary shaped 
domain, There are two transformations that would cir- 

cumvent these problems. 

2.3.1 Coordinate Transformation 

The multispectral scanner detects the reflected electro- 
magnetic energy in a number of optical and infrared bands. 
Although these bands are essentially non-overlapping, the re- 
sponses observed are correlated. A rise in signal amplitude 
in one band is accompanied by a similar effect in an adjacent 
band. In statistical terms this property translates into a 
probability space with correlated variates. 
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The M populations are represented by a set of general 
mean and covariance matrices. Great algebraic simplifica- 
tion would occur if every £ was in a diagonal form resulting 
in separable density functions. This simplification stems 
from the application of product rule. Let X = (x^,x 2 . . . , x^) eB 
and Y = (y^,y 2 / • . -y N , ) eG be in Euclidean spaces of N and N' 
dimension respectively. A Cartesian Product BxG, is a space 
of N+N' dimensions with points (x^/X^ . . . x^ry^yj- * -y N ’ ) such 
that (x^,x 2 - • eB and (y^,y 2 / . - .yjj' ) eG. Let there exist 
an m point integration rule, R, over B 


m r 

R(f) = l a i f(X ± ) ^ I f (X)dV X ± eB 


( 2 - 12 ) 


i=l 


B 


and an n point integration rule, R 1 , over G 
n f 

R’ (f) = l b . f (Y . ) ^ f(Y)dV Y-eG 

j=l 3 3 L " 3 


(2-13) 


Then the product rule of R and R' defined over BxG is given 

by 

m n r 

RxR' = l l a.b.f (X. ,Y.)^ f(X,Y)dV (2-14) 

i=l 4=1 1 -* 1 J J 

1 x 3 ■ L BxG 


From these properties, it quickly follows that if R integrates 
f(X) exactly over B and R 1 integrates g(Y) exactly over G then 
provided h(X,Y) = f(X)g(Y), RxR' integrate h(X,Y) exactly 
over BxG. A brief proaf of this theorem given in [47] fol- 


lows. 
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| h(X,Y) dV 
BxG 


Potentially, this rule can reduce the dimensionality of a 
problem from N to 1. Such a property is not an intrinsic 
feature of the remotely sensed data, however. Moreover, there 
is no transformed space in which M(M>2) covariance matrices 
can be represented in diagonal form. 

A 

Since the calculation of P i precedes the estimation 

c | 

of overall classification accuracy, an M stage successive es- 
timation procedure in M linearly related probability spaces 
can be envisioned. For example, stage i consists of the fol- 
lowing mapping 

j=l,2,...,M (2-16) 

where <{> is the eigenvector matrix derived from Z^. Therefore, 
in each transformed space, T^(S2), has a null mean vector 
and a diagonal covariance matrix. Fig. 2-4 is a pictorial 
representation of (2-16) for two classes. This unitary trans- 
formation is linear, preserves the Euclidean distance and 


i 



= | f (X)g(Y)dV = jf (X)dV B jg(Y)dV ( 
BxG B G 


BxG 
m 


.1; J b scv ) 

1=1 3=1 J J 

m n 

I I a.b .f (X. ) g(Y . ) 
i=l i=l 1 3 1 : 


(2-15) 


= R x R‘ 


32 


pairwise divergence and the probability of error is invariant 
under such mapping. It will be shown that formulation of 
P c | w in T^(£2) will provide an N to 1 dimensionality re- 
duction. 


2.3.2 Discrete Space Approach 

For any continuous formulation of a problem there exists 
a discrete counterpart, specific choice of which is dependent 
upon individual cases and requirements. Let £2 be the con- 
tinuous probability space* A transformation, T, is required 
such that in T(£2), can be completely described in a non- 
parametric form, thereby bypassing the requirement for an 
algebraic representation of This desired transformation 

would sample £2 into a grid of N-dimensional cells according 
to a certain rule; thus, expressing the Bayes error integral 
in the discrete space of T(S2). 

The sampling of the probability space is equivalent to 
the discrete representation of the random variates along each 
feature axis. The multispectral data is generally modeled as 
a multivariate normal random process. What is required, 
therefore, is a discrete approximation to a normal random 
variable that would exhibit desirable limiting properties. 

Let y *iiB i(n,p) . be a binomial random variable with parameters 
n and p. Then x defined by 



Y - np 
n 

/np Cl— p ) 


= 0,1,2, — n 


(2-17) 
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converges to xMAfO,!) in distribution [48]; i.e,, 
lim F (x)-j-F(x) 


n-H» 


n 


The convergence is most rapid if p = Then 


x n = 


(Y n -n/2)2 


(2-18) 


The variance of x^ is set equal to the eigenvalue of the 
transformed E^ by incorporating a multiplicative factor 
(✓XT ) in (2-18). 

The segmentation of S3 by a union of elementary hyper- 
volumes makes nonparametric representation of I\ and its con- 
tours feasible. Some comments regarding the structure of the 
sampled S2 are in order. The coordinates of each cell's center 

are known and given by (2-18) . The spacing between the cen- 

2 

ters is readily shown to be equal to 5 . = — a . along the ith 

1 /E 1 

axis. The grid extent is therefore + /n with n+1 cells 
along each coordinate axis. The simultaneous solution of the 
set of M Nth degree polynomials is now reduced to the identifi 
cation of each cell with one of M partitions within S3. 
Specifically, following the orthonormal transformation on ok 
and sampling of S3 accordingly, each cell's coordinate is as- 
signed to the appropriate r. This process is carried on ex- 
haustively, therefore i\ can be defined as a set such that 

r. = {ux : X er. } (2-19) 

i — n -n i 

Fig. 2-5 shows a pictorial representation of (2-19). The 
description of domains of integration as a union of elementary 
units alleviates the need for the precise knowledge of the 
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boundary location, although the sampling grid can provide an 
estimate within one 6. Once the exhaustive process of assign- 
ment is completed P i , the integral of f(x|(o.) over r . , is 

C | " 1 1 

represented by the sum of hypervolumes over the elementary 
cells within F^. 

Using this procedure, cumbersome implementation of 
numerical integration techniques in multidimensions are 
avoided. One of the main features of the orthonormal trans- 
formations preceding the sampling process is the decoupling 
of thereby generating the separability property of the 

transformed f(x|oj^) along each dimension. Invoicing the product 
rule and designating the domain of a cell, centered at the 
origin within F^, as C^: 


tl 

*2 

f f 2 

2 f 

1 f (x| ok ) dx = I f (x 1 |tu i )dx 1 J 

n S-i 

f (x„ | w. ) dx,. . . 

6 2 1 z J 


N 


_N 

2 


f(x N |co i )dx N 


( 2 - 20 ) 


This unit of probability volume is equal to the product of 
N one dimensional normal integrals, the value of which is 
widely tabulated. Thus, no involved numerical procedure is 
required to evaluate (2-20). 

The relationship expressed in (2-20) is the building 
block in the probability of error estimation. Referring to 
this algorithm as a 'Controlled Space Partitioning' (CSP) we can 
write the conditional classification accuracy estimate as 
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«! 

C +_i 

1 2 


o +fz 

' 2 + 2 


? =K = cL lo-fl f(x ll“i )I i !0>ax lIc-l2 f(* 2 l“i )I x (C,ax 


1 2 

c . 5 N 
*N +— 2 


2 2 


“2 

( 2 - 21 ) 


f «„ f(x l“ i )I i (C)dx„ 

c £i 


N 2 


P 

c 


I 


1=1 


p (*).)£ 

i c 


to . 
1 


where 


I i (C) 


l if c s r. 

i 


j^O otherwise 

C = The domain of an elementary cell 


Pig. 2-6 is a geometrical representation of (2-21) . 

2.4 Error Analysis 

Formulation of a problem with inherently continuous par- 
ameters in a discrete space as a means of approximation or 
estimation of the end product necessarily incurs errors that 
need to be studied. Error terms cannot be expressed in the 
form of exact expressions and can only be bounded or put in 
some defined statistical model; otherwise, the approximation 
would be exact. Extension of the one dimensional integration 
error analysis results does not appear to be possible due to 
the lack of any correspondence between the unidimensional and 
multidimensional integration domains. In the multivariate 
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integration field, the errors studied are related to the simple 
domains such as hypercubes, simplexes etc. [49] . 

There are basically two types of error encountered in the 
implementation of the CSP algorithm. Type I error originates 
in the one dimensional integration of a normal density function 
over the region (a,b) . This quantity is available both in tabu- 
lar form and as FORTRAN callable subroutine subprograms which 
are capable of supplying arbitrarily high accuracy results. 

Type II error occurs only at the boundary of because the 
sampled grid essentially estimates the location of such contours. 
A self cancelling property of this type of error is brought about 
by the geometrical structure of the regions when: (a), f (X|u^) is 

integrated over the whole elementary hypercube instead of a por- 
tion inside l\ (grid point close to the boundary and X^r^ ) ; 

and (b), f(x|w.) is not integrated over a portion of r. (grid 
1 1 

point X^ close to the boundary but X^r^ ) • (a) adds a positive 

bias and (b) adds a negative bias to the result of integration, 

P . For a sampling grid with fine subdivision and over the 
c 

ensemble location of all the boundaries, the events 

{X eT. or X ir. lx near the boundary} (2-22) 

— n l — n x — n 

have equal likelihood; hence; positive and negative biases 
occur equally often. Fig. 2-7 shows the structure of type 
II error in 2-dimensions. 

2.4.1 Statistical Properties of the Estimate 
The error encountered in estimating the classification 
error is primarily of type II. Much insight into the structure 
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of this error is obtained by examining the problem in one 

dimension. Let f(x|co) be a class conditional Gaussian density 

function, N{0,1), and let x^ be a fictitious unknown boundary 

possibly separating to from some other population. A grid of 

size n is set up and it is determined that x el*, and x Jp, 

n Q i n c) 1 

Fig, 2-8. x has equal likelihood of being inside or out- 
o 

side r ± . Equivalently, it can be stated that x^ is random 
with uniform variations within one 6, i.e.. 


V D(x n - I ' x n + l> 
o o 


(2-23) 


The error in estimating the area of can therefore be repre- 
sented as 

e = 

x 

n o 

which, depending on 3^, can take on either positive or nega- 
tive values. The expected value of e is 


% 


f (x) dx 


(2-24) 



(2-25) 


= Q(x ) - 


n. 


.6 

1 fV 5 

5 J 


x n “2 


Q( V dX b 
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The variation of e vs. x is plotted in Fig. 2-9. 

o 

Examination of e shows that although small, it is 

mostly negative. Its magnitude decreases with increasing 

n and x . These properties can also be deduced geometrically, 
o 

The negative bias is due to the following obvious inequality 
Q (a) -Q (b) >Q (c) -Q (d) 

given that 

f a<b<c 
c-a>0 

• d-b>0 (2-26) 

a,b,c,d>0 

b. 

Fig. 2-8 shows two cases where the closest cell to the boundary 

is considered either inside (x er^) or outside (x the 

o o 

decision region resulting in an over and underestimation of 
correct classification, respectively. From (2-26) it then 
follows that the magnitude of negative bias is greater than 
that of positive bias. Thus, this procedure gives estimator 
with a net negative bias. 

A different situation exists when the region of integra- 
tion is doubly connected as in Fig. 2-10. In this case the 
shift of a cell center from just inside the boundary to just 
outside, produces an opposite effect. Whereas in the pre- 
vious case such a shift would have reversed the sign of the 
bias term from positive to negative with an increase in mag- 
nitude, in tht: new domain the net change in bias will be 
positive simply because the inside-to-outside move now is 




BOUNDARY LOCATION 


2-=l EXPECTED VALUE OF AN AREA ESTIMATOR USING 
EQUIDISTANT SAMPLES. S4 CELLS PER AXIS. 
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toward the mean rather than away from it. 

In N dimensions the total error in estimating the 
conditional probability of correct classification can be 
represented by a weighted sum of the boundary errors. 



B 

I 

i=l 


w. e . 
x r 


(2-27) 


where is the number of cells along the boundary, w^ 
is the weighting sequence and e^ is the N dimensional error 
associated with the ith boundary cell. In order to obtain 
a variance expression for e^, the statistical properties of 
e^ need to be determined. From (2-22) it follows that the 
location of the boundary is uniformly distributed within 
one boundary cell width. In general, it does not follow 
that the volume error is also uniformly distributed within 
one cell volume. This is strictly true only in cases where 
the decision boundary and the boundary of a cell are 'par- 
allel ' . Adoption of a uniform distribution assumption for 
e^, however, provides a considerable simplification in the 
derivation of an expression for the variance of the error. 
With regard to the first expression for the variance of e T 
it should be noted that the assumption of a uniformly 
distributed e^ generates a variance higher than the true 
value. Thus, the resulting expression can be taken as an 
upper bound on the variance of e^. Let 

v_ v 

e_^ *>» U {— g 1 1 — -y — ) (2—28 ) 
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where v is the volume of an elementary cell given by 
c 

N 

v - n a. ( 2—29) 

c i=i 1 


The contribution of e^ to the total integration error 
should clearly be weighted to incorporate the effect of boun- 
dary location relative to the mean of An appropriate 

choice for the weighting sequence w i is the height of f{x|tD^) 
at a particular” boundary point. A weighting process such as 
this effectively assigns a 'significance* to each e^. Although 
the magnitude of e^ may have been large in the context of 
volume approximation of I\, if the normalized distance of 
su.h cell to u. is large it generates negligible volume under 
f (x|w i ) . 

In order to obtain a variance expression for e T the small 
bias is assumed negligible. The variance of individual er- 
rors, e. is 2 

v 7 

Var{e ± } = = aj* (2-3Q) 


Therefore, 


Varie^} 


n b n b 


E { e m } = E { y y w.w.e.e.} 
T lixi 


i=l j=l 




N_ N 

B 2 — 2 ® B 

= J w. e. + ) J w.w.e.e . 
i=l 1 1 i=l j=l 1 3 i D 

i^j 

N N N 

B 2 2 B B 

= y w. a. + y y w . w . p . . . a . 

i«i 1 1 ±ii j=i 1 3 ^ 1 ^ 

irj 


(2-31) 


where p. . is the correlation coefficient between e. and e.. Ob- 

i 3 

taining an analytical expression for p^j in N dimensions is 
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theoretically feasible however its complexity considerably 


diminishes the usefulness of (2-31). Because the p.. are 

13 

small for widely separated boundary cells, a reasonable ap- 
proximation to the Varfe^} is given by 


W B 

Var(e T } = . I w?a? 


1 1 


i=l 


(2-32) 


Expanding (2-32) 


2 _ 1 , 5 ? . ,2 
“i ~ 12 ^ V 


(2-33) 


w . 
3 


Try t N/2 K 

( 2 TT ) JI 0; 


-xTe _1 x. 
e 3 3 

J 1 


(2-34) 


i=l 


where r is the boundary domain of r . . 

i :l 

S. = a. in (2-33) 

1 & 1 


Substituting 


N 


2 - 1 / 7T 

a. — \ 11 


2 \ 2 
cr. ) 


1 12 i=l G X 


2 2N N 2 


N A °i 


12n i=l 


(2-35) 


Therefore, (2-32) is equal to 


N B 

„ , , 1 , 2 N r B _2 

Varte T ) 12 ( H7> f i 


(2-36) 


where f . is the exponential part of (2-34) 


#ss 
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The variation of Var {e^ } vs. n and N cannot be fully 

2 

explored due to the E f^ factor which is problem-dependent. 
However, it does follow that P c is convergent in the mean- 
square sense; 

e{[p -P | 2 } = Var{e_} — > 0 for n ■+ ~ (2-37) 

CO J. 

This observation is less than obvious since as n increases 
2 

so do N and If. and, hence, could potentially be self- 

canceling. Although the increase of N_ and n are monotonic, 

experimental evidence suggests that as a percentage of cells 

within l\ steadily decreases so while more cells are allocated 

to i\, comparatively fewer ones reside near the boundary. 

2 

Therefore, Ef^ only slows the convergence of the variance to- 
wards zero. It also follows from (2-37) that variance de- 
creases for high recognition rates (i.e., small f^) . 

2 . 5 General Comments 

Formulation of a problem in an N-dimensional space re- 
quires coping with situations not present in the single dimen- 
sion case. In addition to the mathematical complexities, the 
practicality of implementation of any method should be closely 
examined. In particular, with the digital computer capability 
and its cost as the ultimate limiting factor, the computation 
time of processing in an N-dimensional domain takes on a prime 
importance . 

Techniques requiring exhaustive enumerations can be 
potentially expensive, in many instances totally beyond the 
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available resources. The implementation of (2-21) requires 
processing of an N-dimensional grid of points. With (n+1) 
cells along each axis, there are (n+l) N such points to be 
allocated to their respective domains. In one dimension, very 
accurate estimates can be obtained long before the size of 
n+1 presents any computational difficulties. The multi- 
dimensional case is different. The exponential rise of the 
grid size with N can make the execution time prohibitively 
long. This "dimensionality effect" can effectively generate a 
computational barrier and thus render the algorithm inoperative 
if n is "too large". 

The quality of the estimate as shown in sec. 2.4 is 
dependent on the size of n; i.e., the grid fineness. So the 
central question is whether n can be large enough to generate 
a high quality estimate and yet small enough to make the 
estimation process feasible. The sampling grid, the algorithm 
and remotely sensed data itself have properties that help 
answer this question in the affirmative. The current MSS 
system in operation collects data in four spectral bands. It 
is believed that future space platforms primarily Landsat C 
will be equipped with scanners having data collection cap- 
ability not beyond five spectral bands. Therefore, N for all 
useful purposes is limited to the 4 to 6 range. Actually, 
optimum processes may not utilize all of these bands due to 
their redundancy. 

The next question is the relative magnitude of n. The 
answer lies partly in the outer location of the desired 
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boundaries and the fact that the sampling grid must cover the 
entire relevant domains. It turns out that in most cases of 
interest, is a simply connected domain but Z^ is a doubly 
connected domain. This approach toward the evaluation of 
the classification performance through the estimation of 
probability of correct classification ensures that sampling 
of the probability space is confined to a closed finite do- 
main of r., thus alleviating the need to sample Z., a far 

1 X 

larger region. Having established that IV is bounded in many 

cases, the question now is whether the outer limits of the 

grid will encompass the appropriate boundary and thus sample 

I\ thoroughly with a reasonable magnitude for n. 

Define g e as the extent of the sampling grid along the 
i 

ith transformed feature axis . From sec . 2.3.2, j g e . | = /no . . 

Although no such quantity can be precisely defined for IV, 

let r e , be the outer limits of IV in some average sense. 

Two cases can be distinguished: (a), r e .<|g e | in which case 

i 

clearly, the grid has sampled the entire domain of interest; 
and (b) , a condition which either means that n is 

exceedingly small or that r e ^ is located very many a^s away 
from In this case any error committed but unaccounted 

for, will have very small effect on the outcome due to the 
negligible volume under a normal density function for any- 
thing more than a few standard deviations from the origin. 
Since in most applications n £ 8 , the grid extent will be 
> ±2.2a and will satisfactorily sample the entire domain of 
interest. 


51 


CHAPTER 3 

Line Scanner Imaging Systems 

The primary goal of a remote sensing system is the 
collection from a scene of reflected or emitted electro- 
magnetic energy in selected spectral bands. This task has 
been traditionally accomplished by airborne photographic 
equipment and is analyzed by photointerpretation tech- 
niques. There are several major drawbacks associated 
with such a method. 

The sensitivity of photographic films is generally 
limited to the near ultraviolet-near infrared band; there- 
fore, night time operation is severely limited unless the 
scene is externally illuminated. Clouds, fog and smoke are 
opaque through this portion of electromagnetic spectrum. 
Most importantly, handling of the film itself is awkward 
and the accompanying telemetry problem can make its 
deployment aboard a nonrecoverable vehicle unattractive. 

Nonphotographic sensors overcome many of these short- 
comings. Through the selection of the proper detector, 
spectral coverage can be extended to microwave and beyond 
where clouds and bad weather do not seriously hinder the 
sensor’s performance. Having the data in the form of an 
electrical signal lends itself to efficient and powerful 
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transmission and processing techniques. 

3.1 Types of Systems 

The majority of current remotely sensed data is ob- 
tained in the ultraviolet , visible and infrared portions 
of the spectrum by scanning systems. One of the earliest 
of such systems was 'Reconofar' operating in the visible 
region [50] . It used either moonlight or an internal 
illumination source to produce maps of the ground scene 
at night. Lack of detectors with rapid rise time, produced 
imagery with unsatisfactory resolution compared to photo- 
graphic methods. As a result of improvements in detector 
technology, current scanning system can produce imagery 
of high quality within a reliable, compact and fairly simple 
structure. 


3.1.1 Multispectral Scanners 


A widely used earth resources data gathering system is 
the electro-optical scanning radiometer otherwise known as 
a multispectral scanner. A MSS is generally an object 
plane scanner £51 ] and consists of a rotating mirror and a 
telescope ’that directs reflect'- 1 energy from a small portion 


of the object plane. A bank of detectors responding to 


different wavelengths receives ‘the incoming radiation 
which, after detection, sampling and quantization, is 


tele- 



metered to the ground station. When such a system is placed 


:] 
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in an aircraft or an earth orbiting satellite a strip map 
of the ground scene is produced. The cros^-track coverage 
is performed by the oscillating mirror and vehicle motion 
accomplishes the along-track coverage. Contiguous coverage 
is required to prevent underlap. This can occur if the 
satellite speed is too high or the mirror’s rotational 
rate too slow. 

This simple structure can be upgraded to include the 
currently employed scanners in which an n sided mirror 
rotates at a rate of r revolution per second thereby 
producing n lines for each rotation. There are a total of 
d detectors and/ thus, d lines are scanned by each side of 
the mirror. A total of nxd lines are scanned for a full 
rotation. 

Let kx be the dwell time of a detector on each 
resolution element and V and H be the speed and altitude 
of the vehicle, respectively. It can be shown [50] that 
subject to a dwell time not less than kx and a no underlap 
scanning mode, the angular resolution of an MSS has a lower 
bound given by 

* > (2u-k/nd) (V/H) (x) (3-1) 

with equality for contiguous lines. From a hardware point 
of view, the adjustable parameters are. limited. V and H 
are interdependent and are determined by orbit considerations, 
x is a property of the detector, n and d are variable 
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parameters to choose as a means of the MSS instantaneous 
field of view (IFOV) control. 

One of the most widely used operational remote sensing 
instruments is the Landsat multispectral scanner. Landsat, 
an Earth resources monitoring satellite, is positioned 
on a polar orbit at an altitude of about 900 km with a 
complete global coverage cycle of 18 days. The vehicle’s 
operation is chosen so as to provide a 14% scan overlap. 

The MSS collects data in 4 spectral bands , two visible and 
two near infrared, all in spatial registration. Six lines 
are scanned simultaneously and with an IFOV of 87 yrad 
providing a ground resolution of about 80 m, with a total 
cross-range width of 185 km. The Skylab S192 scanner pro- 
vided similar resolution with 13 spectral bands from 0.5 pm 
to 12.5 ym. Among other MSS systems is the Thematic Mapper 
for Landsat D. Spectral coverage is extended to 7 bands 
from .51 ym to 2.35 ym with some gaps plus a thermal band 
from 10.4 ym to 12.6 ym. Angular resolution of 33 yrad will 
correspond to a ground IFOV of 30 m at a 900 km altitude 
[52]. 

3.2 System Modeling of a Multispectral Scanner System 

The objectives outlined in the introductory chapter 
required a parametric representation and evaluation of the 
MSS performance. Like any other complex and integrated 
system, the multiplicity of parameters is numerous. Sensor 
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choice, band selection and telemetry links, are but a 
few of the interacting components of the system design. 
Prom the viewpoint of information extraction and pro- 
cessing, however, the spatial characteristics of a 
scanner along with the spatial resolution and additive 
noise take on a particular significance. 

Modeling of the MSS by a linear system opens the 
way to the application of existing techniques in system 
theory. Since the classification accuracy is totally 
a function of class statistics under the Bayes rule, 
examination of the random proces-. transformation 
carried out by the scanner PSF can be most revealing. 
Topics of particular interest are 

1. Effect of the scanner IFOV on population 
statistics. 

2. Effect of data spatial correlation on the 
classification accuracy. 

3. Effect of signal-to-noise ratio on 
classification accuracy. 

4. Trade off between spatial resolution and 
SNR. 

5. Effect of spatial resolution on 
classification accuracy. 

6. The interactive relationship between IFOV, 
spatial correlation, class statistics, SNR 
and classification accuracy. 
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3.2.1 MSS Spatial Model 

The incident electromagnetic energy after reflection 
from a target is detected by the scanner IFOV. The ulti- 
mate goal of such operation is a perfect reproduction of the 
radiant energy. This objective cannot be accomplished with 
any physically realizable system. Finite IFOV, required 
by detector sensitivity among other things, keeps the 
ground resolution at a finite level . The resolution 
degradation can be subsequently dealt with through various 
image enhancement techniques [53,54]. 

The averaging operation performed by the scanner 
point spread function can be modeled by a linear shift- 
invariant multiple- input, multiple-output system. Input 
signals consist of N random processes in N spectral bands 
corrupted by atmospheric noise and scattering. Each input 
is linearly transformed by the scanner PSF and additional 
detector and pre-amp noise further contributes to the signal 
degradation . 

Fig. 3-1 is a basic block diagram of this spatial 
model. h(x,y) is the two dimensional PSF to be specified 
for any desired system. In particular where the MSS is 
concerned, the assumption of a Gaussian shaped IFOV has been 
widespread. The justification for this is essentially 
satisfactory experimental results and perhaps equally 
important is the mathematical convenience of this model. 

Note that the results obtained hereafter are fundamentally 
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independent of the functional form of the PSF. However, 
using this assumption, it is frequently possible to 
obtain closed form expressions and to make comparisons 
with alternate methods a majority of which adhere to 
the same assumption. 

In a two dimensional plane a Gaussian PSF is specified 
by the following relationship 



h(x,y) = c 1 e r ° e r o (3-2) 

The important parameter is r Q , PSF’s characteristic 
length, which in effect determines the ultimate ground 
resolution and noise content of the collected data. 
Increasing r Q results in a deterioration of the former 
but improvement of the latter. The significant 
property of h(x,y) , is its separability along the 
cross and along-track directions resulting in some 
simplifications of the analytical relationships 
governing the scanner operation. In practice, h(x,y) 
is truncated at some point, usually 0.1 h(0,0), to keep 
the computation time down. The noramlizing constant 
c^, provides a unity gain for this averaging operation 
(Appendix A) . 
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An alternate PSF but not as yet operational aboard 
Landsat is the rectangular function defined by 


h(x,y) = 




| X |,|y| <r o /2 


0 


The definition of the 1F0V adopted here for either a 
Gaussian or rectangular PSF is such that IFOV = r Q . 


3.2.2 MSS Statistical Model and Spatial Correlation 

As the input random processes undergo a linear trans- 
formation, so do their statistical properties. In order to 
investigate the various interactive relationships outlined 
previously, an understanding and knowledge of the signal 
flow through the scanner is essential. 

Relating the statistics of the multispectral signal 
at the scanner output to the corresponding part at the input 
can be accomplished in various ways. It has been pointed 
out that a two dimensional convolution is equivalent to a 
matrix multiplication in which one matrix is block circu- 

lant [55] . Let F and G be the input and output matrices 
. 2 

arranged in P xl column vectors. Then they are related 
by 

G = HF 


where PSF matrix H, has the following structure 


(3-3) 
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H 


H. 


P-1 


H, 


H, 


H » 


H 


H, 


H P-1 H ?-2 


H 


Each element in H is itself a PxP matrix. For a particular 
case, a selected number of fields can be chosen and pro- 
cessed by (3-3) to produce the G matrix followed by the 
calculation of a pooled auto and cross spectral correla- 
tion matrix. 

This method has the advantage of requiring no a priori 
spatial information yet its data dependent nature makes 
the results of any study limited to the particular data set 
used. The more general approach, providing possibly closed 
form expressions for the quantities desired, is the appli- 
cation of linear system theory techniques to the MSS . This , 
however, requires some a priori specification of data pro- 
perties in an algebraic form, the main item being the spa- 
tial correlation model. 

Agricultural crop planting, natural formations of ter- 
rain, water supplies, etc. all exhibit a certain homogeneity 
in their structure; therefore, it is expected that the re- 
flected energy sensed by a scanner will show the same pro- 
perty in the form of a correlation between adjacent pixels 
of the final digital data set. Comparatively speaking, 
spectral classification has been much more widespread than 
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spatial classification, resulting in less than a full 
attention to the spatial properties of remotely sensed data. 
It has been suggested, however, that the experimentally ob- 
served correlation functions approximately follow a decaying 
exponential [56,57]. This assumption implies a Markov model 
for the spatial characteristics of the data- Let R^ be 
the spatial correlation matrix of the kth spectral band 


[^ij 3 3. , j — 0,1, (3 5 ) 

Under the two assumptions: (a), Markov correlation struc- 

ture; and (b) , separability along the cross-track and 
along-track directions, R^ can be specified as follows 

R k = [r ii ] = pJ p v *'3 = O' 1 ' •■*' V 1 (3 “ 6) 

* 13 x k y k ° 

t 

where p and p are the adjacent pixel correlation 
x k y k 

coefficients along the respective directions given by 


P 


x 


k 


e 


-a 


kk 


p 


y k 



(3-7) 


Similarly, the spatial crosscorrelation matrix between two 
bands p and q is defined as 


R _ 
— pq 


■ [r i3 ! - 


'X 


"pq x pq 


if j - 0,1, 


n -1 
o 


(3-8) 


where 


X. 


-a 

= e 


pq 


(3-9) 


-b 


= e 


pq 


pq 


In order to examine the validity of the Markov model 
and the separability property of the correlation functions, 
a sample aircraft MSS data set is selected and the estimate 
of the auto and crosscorrelation functions in two spectral 
bands, one in visible and one in near infrared, is obtained 
by a lagged-product sum method [58] . The separability char- 
acteristics can be checked by completing the entire cor- 
relation matrix, R, using [r^] = [r^r_.] and comparing it 
to the experimentally observed quantity. Let E be the error 
matrix associated with this operation, then 


E = | [r. .] - [r.r.] | (3-10) 

1 j i J 

The results are shown in Fig. 3-2 through Fig. 3-4 and 
Tables 3-1 through 3-3. Although the shape of the 
correlation curves themselves indicate an approximate 
exponential behavior, a quantitative weighted least-squares 
fit shows that this assumption is indeed valid. The 
differential between the correlation of the lines and 
columns of this data set stems from the fact that the 
analog signals are sampled in a way that generates unequal 
separation between the corresponding ground resolution 
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•Table 3-1 Error Matrix for Correlation Function Approximation for 
Channel 2. 



0.95 

0.87 

0.79 

0.79 

0.74 

0.67 

0.55 

0.52 

0.48 

0.36 

0.35 

0.33 

0.27 

0.28 

0.27 

0.24 

0.25 

0.25 

0.19 

0.20 

0.21 


0.95 

0.87 

0.79 

0.77 

0.70 

0.63 

.51 

.47 

.42 

.32 

.3 

.27 

.23 

.21 

.19 

.2 

.19 

.17 

.25 

.13 

.12 


0 

0 

0 

2 6 

5.4 

5.9 

7.2 

9.6 

12.5 

11.1 

14.2 

18.1 

14.8 

25 

29.6 

16.6 

24 

32 

24. 

35 

42.8 


0.71 

0.64 

0.58 

0.60 

0.53 

0.48 

0.42 

0.31 

0.32 

0.29 

0.25 

0.21 

0.25 

0.21 

0.18 

0.23 

0.21 

0.18 

0.20 

. 0.19 

0.16 


0.71 

0.64 

0.58 

0.57 

0.51 

.47 

.38 

.34 

.31 

.24 

.21 

.2 

.17 

.16 

.14 

.15 

.14 

.12 

.11 

.1 

.09 


0 

0 

0 

5 

3.7 

.2 

9.5 

8.1 

3.1 

17.2 

16 

4.7 

32 

23.8 

22.2 

34.7 

33.3 

33.3 

45 

47.3 

43.7 




Table 3-2 Error Matrix for Correlation Function Approximation Tor 
Channel 8. 


r 

1.00 

.9 

.77 

.64 

.52 

<41 

.33 

.71 

.68 

.59 

.48 

.37 

.28 

.20 

.43 

.43 

.38 

.3 

.21 

.12 

.06 

.33 

.35 

.33 

.27 

.19 

.12 

.06 

.30 

.32 

.31 

.28 

.23 

.17 

.12 

.22 

.75 

.26 

.24 

.20 

.15 

.11 

.10 

.13 

.14 

.13 

.11 

.08 

.05 

r 

1.00 

.9 

.77 

.64 

.52 

.41 

.33 

.71 

.64 

.55 

.45 

.37 

.3 

.11 

.43 

.39 

.33 

.27 

.22 

.17 

.14 

.33 

.3 

.25 

.21 

.17 

.14 

.11 

.30 

.27 

.23 

.2 

.15 

.12 

.09 

.22 

.2 

.17 

.14 

.11 

.09 

.07 

.1 

.9 

.07 

.06 

.05 

.04 

.03 

0 

0 

. 0 

0 

0 

0 

0 

0 

5.8 

6.7 

6.2 

0 

6.6 

45 

0 

9.3 

13.1 

10 

4.5 

30 

57 

0 

14.2, 

24.2 

22.2 

10.5 

14.2 

45.4 

0 

15.6 

25.8 

28.5 

34.7 

29.4 

25 

0 

20 

34.6 

41.6 

45 

40 

36.3 

0 

30.7 

50 

53.8 

54.5 

50 

40 




1 


Table 3-3 Error Matrix for Cross Correlation Function Approximation 
Between Channels 2 and 8. 


28 


R 2 6 


20 


1.00 

.92 

.81 

.69 

.59 

.50 

.44 

.93 

.88 

.78 

.67 

.56 

.48 

.41 

.73 

.71 

.64 

.54 

.44 

.36 

.3 

.48 

.47 

.43 

.36 

.28 

.21 

.16 

.30 

.31 

.29 

.24 

.18 

.12 

.08 

.23 

.25 

.24 

.21 

.16 

.12 

.08 

.22 

.24 

.24 

.22 

.19 

.15 

.12 

— 






— 

1.00 

.92 

.81 

.69 

.59 

.50 

.44 

.93 

.86 

u 75 

.64 

.55 

.46 

.4 

.73 

.67 

.6 

.5 

.43 

.36 

.32 

.48 

.44 

.38 

.33 

.28 

.24 

.71 

.30 

.27 

.24 

.2 

.17 

.15 

.13 

.23 

.21 

.18 

.16 

.13 

.11 

.1 

.22 

.2 

.18 

.15 

.13 

.11 

.1 

0 

0 

0 

0 

0 

0 

0 

0 

2.2 

3.8 

4.5 

1.8 

4.16 

2.43 

0 

5.6* 

6.2 

7.4 

2.3 

0 

6.75 

0 

6.4 

11.6 

8.3 

0 

17.5 

23.8 

0 

13.0 

17.2 

16.6 

5.5 

20 

38.4 

0 

16 

25 ' 

23.8 

18.7 

8.3 

20 

0 

16.6 

25 

31.8 

31.6 

26.6 

16.6 
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elements along the scan swath and vehicle down track motion. j 

The unusually high cross-track pixel- to-pixel correlation 

is attributed to the use of very high resolution aircraft 

data. For satellite imagery a p = 0.8 is a more common 

x k 

value. 

The separability property of the correlation matrices : 

appears to be a reasonable assumption according to the 
correlation error matrices. As will be shown later, this 
is not a feature peculiar to this data set but is observed ] 

throughout most of the multispectral data bases. The main 
property exhibited by E, is that the separability assump- 
tion becomes progressively invalid for higher lag values. ; 1 

This, however, is not particularly detrimental to the ; ] 

correlation model proposed here due to the fact that although i 

the absolute error term expressed in percentage can be rela- ; j 

tively high, the normalized values of the correlation func- -] 

tion in the range of concern are themselves quite small, J 

and thus, carry little weight in influencing the final I 

results. 3 

With the correlation model well defined, the output j 

spectral covariance matrix can be specified. Let R ] 

9i^j | 

be the output spatial correlation matrix between : j 

! j 

spectral bands i and j and output covariance matrix, re- i 

spectively, then ' 

y (i,j) = [R (0,0)] i,j -1,2, ..., N (3-11) 


and l 


l 


Note that when considered over the ensemble of all the 
bands, matrix I^is an (n Q xN) (n_xN) partitioned matrix, 
given by 






[ \% ! 



[R 

~^2 g l 


] 


[r i 

~^ 2 g 2 


[6 ] 
^2% 


(3-12) 


[R _ ] 
,^N g l 


®Wj ] 


where [R. .] is the n xn spatial correlation matrix. I 
—13 00 g 

however, is only a function of zero lag elements of R. , 

R„ „ (0,0). Therefore, only NxN out of (n xN) (n xN) entries 

-gigj ■* 00 

of R need be calculated. It is clear that the spectral 
— g 

correlation matrix is a small subset of spatial correlation 
matrices whose elements have the following locations. 


= R g (i-l)n 0 ,(j-l)n o ) i,j-l,2. 


. ,N (3-13) 


The analytical relationship between the input and out- 
put correlation matrices of an N-band MSS is investigated 
in Appendix A. Specific results are obtained for a 
Markov-correlated data set, a Gaussian and a rectangular 
shaped scanner IPOV. The main result obtained there is a 
scanner characteristic function W (x,Tira,b) given by 
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W (t , n ^a. .,b. .) = 
s 11 IX 


2 2 
a r 

-V^-a. .t 

e “0b a V? ,<e 

o 



+a. .t 

11 Q(a. .r +^-) 
ii o r 
o 


X 


tx 2 r 2 

^2-b..n 

e 2 11 Q(b. .r — — )+e 

c no r ' 

o 



-fb . . t) _ . n 

n 0{b. .r + — 
nor 


( 3 - 14 ) 


where a^ . and b^^ are the parameters of input spatial cor- 
relation function determining the adjacent pixel correlation 
in band i, r Q is the scanner PSF characteristic length 
and Q is as defined earlier. 

W plays a central role in the spatial modeling of a 
s 

multi spectral scanner. It is a function by which all 
channel variances and band-to-band correlation coefficients 
are weighted to produce the corresponding output quantity. 
Specifically, 



- W s ( 0 , 0 ,a ii' b ii* 



i=l. 


N 


W (0,0, a. . ,b. .) 
s in i] 


g<9 


s f.f 


i y j (0 ,0 /a ±i ,b ±i ) x ij4 


, N 


W' 2 (0 , 0 ,a . . ,b . . ) 
s 3] ]] 


( 3 - 15 ) 
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where s , s , (a. , ,b,.) and (a..,b. .) are the input 
z^rj * L - L 11 i J 1 j 

crosscor relation coefficient between bands i and j . The 
corresponding output quantity, the parameters of the band 
i autocorrelation function and the parameters of bands i 
and j crosscorrelation function, respectively. 

Evaluating W (T,n,a,b) for all values of t and n can 

5 

complete the entire output spatial matrix The Bayes 
classifier, however, is not a spatial classifier but, rather, 
is a spectral one and, as a result, the knowledge of an 
NxN spectral covariance matrix is sufficient for classifi- 
cation purposes. As it was envisioned at the beginning, 
developing a parametric model provides a significant flex- 
ibility in the system analysis. For example, W can 
selectively supply any entry of the output spatial matrix 


desired. Here, W (T,n,a,b) 

s 


t=n=0 


can complete the output 


spectral covariance matrix 

(a 2 +b 2 ) 2 

W (0,0, a,b) - 4e 2 ° 

s 


Q (ar ) Q(br ) 
o o 


(3-16) 


For example, when the input random process is a two spectral 
band data set, the output spectral correlation matrix, 

S is given in terms of S as follows : 

■■ 'j-r — — F 
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1 s 


f f 
T2 


I* 


_V^2^2l__ s 

wf(0 r °, a u . b u >'^l0,0,a 22 ,b 22 ) f l f 2 


% 


(3-17) 

It is clear that, depending on the particular value of 
W g , the output correlation matrices, and hence, classifica- 
tion accuracies will be modified. The variations of W as a 

s 

function of scene correlation and scanner spatial parameters 
can be very illuminating. For a Gaussian scanner PSF, W 

s 

is plotted vs. the sample- to- sample correlation for a fixed 
line-to-line correlation. The IFOV is used as a running 
parameter. Fig. 3-5 through 3-12. The adjacent sample 
correlation coefficient ranges from a near white noise 0.1 
to total correlation of 1 (constant signal amplitude) . 

The adjacent line correlation coefficient extends from 0.65 
to 1. Similar plots are shown for a rectangular PSF, Fig. 
3-13 through 3-15. Examination of these results reveals 
several important features: (a). Since 0£w - 1 , the out- 

S 

put channel variances are always smaller than the corres- 
ponding input quantity. This is an expected result due to 
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the averaging operation of the scanner, (b) , for a fixed 
sample-to-sample correlation the spectral band variances 
at the output increase with decreasing IFOV with an accom- 
panying degradation in classification accuracy, (c) , for 
a fixed IFOV, the channel variances increase with decreas- 
ing scene correlation. These observations apply to any 
one of the cases with a fixed inter-line correlation. 

Consider two cases in which IFOV and sample-to-sample 
correlation are fixed, then a higher adjacent line 
correlation produces an increase in the output band variance. 

The variations of the spectral correlation coefficients 
between bands are similarly determined. From (3-17) , 
depending on the parameters of the correlation model , the 
ratio of two characteristic functions can potentially 
either increase or decrease the spectral band correlation. 

3. 3 Kfoise in Multispectral Scanner System Modeling 

Random noise is the ultimate limiting factor in a data 
transmission and processing system. Although the per- 
formance of remote sensing systems is affected by many 
other parameters , additional noise entering the system at 
various stages can have a significant impact on the final 
analysis of the data. Hence, no model would be complete 
without the identification of the noise sources and deter- 
mination of their contribution to the system performance 
degradation. 
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There are two broad categories of noise generating 
sources: external and internal. External noise is primar- 

ily caused by the atmosphere in the form of molecular 
absorption and scattering. In the case of the MSS in Landsat 
there are two major absorption bands at a wavelength of 
about 0.68 11 m due to the present of oxygen and water vapor 
which result in an attenuation of up to 10% or more for a 
vertical path from the surface of the earth to the platform. 
Scattering is the major cause of attenuation of the 
reflected energy. It has been experimenta lly observed that a 
combined Rayleigh and Mie scattering can cause up to 40% 
transmission loss through the atmosphere at 0.4 ym with a 
decreasing effect at higher wavelengths [59] . A designer 
has little influence over these natural phenomena and 
can only select appropriate windows in the atmospheric 
transmission spectrum to minimize absorption and scattering. 
In view of this situation consideration of external noise 
sources will not be pursued further. 

3.3.1 System Noise 

The noise generated within the scanner subsystem is 
primarily of two types: (a) noise introduced by the sen- 

sors in the detection stage of the incoming radiation; 
and (b) the quantization noise developed in the A/D con- 
version process prior to transmission to the ground 
stations. 


Detectors are the most basic and crucial elements in 
a scanner system. Initially, thermal detectors, in which 
the impinging radiation heats a sensitive element and a 
temperature-dependent property is monitored, were in wide- 
spread use. The advent of high speed scanning mechanisms, 
requiring extremely short dwell time on a ground resolu- 
tion element, required detectors with much higher sensitivity 
than thermal detectors. Photodetectors, where the photon 
energy in the incident radiation produces free charge 
carriers, are now primarily used in visible and infrared 
detection stages and provide time constants of the order of 
nanoseconds. Their disadvantage, compared to thermal 
detectors, is their limited spectral response and in most 
cases they require cooling. The currently operational 
Landsat-2 employs photomultipliers for the bands 0.5-0. 6pm, 
0.6-0. 7pm, 0.7-0. 8pm, and silicon photodiodes for the range 
0.8-1. lpm. Landsat-C will carry a thermal band, 10. 4-12. 6pm 
using two mercury-cadmium-telluride detectors [60] . 

The noise generated by a detector is a combination of photon 
and photomission noise. Let e<l be the photocathode effi- 
ciency of a photomultiplier with a gain G n , the sampling 

-19 

time T, the charge on an electron q =1.6x10 coulomb and 
the signal current out of the detector. Is. The signal- 
to-noise power ratio at the output is given by [61] . 


The sampling time per detector, T, for the Landsat MSS is 
about 0.4 us. Assuming some typical values for other 
parameters : 


Is = 1 mA 
G = 3 
n = 10 

the SNR at the detectors output is approximately 

SNR 2 42 dB ( 3-19 ) 

The next noise source is the A/D conversion process 
where analog signals are sampled and quantized to 2 levels, 
each B bits long. The performance of the quantizer can be, 
evaluated in two ways. It is clear that the signal pre- 
sented to the digitizer is already corrupted by detector 

noise, so the signal plus noise is actually being quantized 

B 

and assigned to one of the 2 levels. Therefore, the 
presence of noise makes this assignment subject to a finite 
probability of error thus affecting the performance measure. 
The second method simply involves the specification of 
noise power introduced by a uniform quantizer and is given 
by [62] 

°n = &2/12 (3-20) 

where A is the quantization step size. Defining a balanced 
system in which the detector and quantization noise are 
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equal/ the combined SNR is therefore 

SNR + 39 dB (3-21) 

The data generated by Landsat is quantized to one of 
64 levels {6 bit per pixel) with A-l. In terms of the 
first performance measure, the assumption of equality of 
quantization and detector noise contribution to total 
system noise, implies that at the quantizer input 

* = ~ = /12 (3-22) 

n 

where ip is the ratio of step size to rms noise. For this 
particular value of ifr, the probability of the 6th bit being 
incorrectly assigned is 0.12 and essentially zero for 6th 
and/or the 5th bit [63,61]. 

Random noise in the context of multispectral remotely 
sensed data takes on a particular role. In the more clas- 
sical applications of pattern recognition such as an M-ary 
communication channel employing one of M equally likely 
and known signals, noise is identified as the primary 
limiting factor in detecting the transmitted message with 
zero probability of error. The distinction emerges at this 
point that multispectral data is itself a realization of a 
stochastic process and as such, there is an inherent finite 
probability of error, regardless of noise, associated with 
the testing of hypothesis. In the analysis of the data, 
the noise and signal statistics will be merged and represent 

original 
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the population statistics and if the additive noise has 
Gaussian properties , the populations will still be normally 
distributed. In fact, it is plausible for some other 
hypothetical class to have identical statistics without 
noise as a part of its own properties. Therefore, a 'noisy' 
class can potentially be as separable as another 'noise-free' 
population . 

The additive random noise has two major impacts on the 
statistics of multispectral data. The obvious one is the 
broadening of the distributions, resulting in more inter- 
class overlap hence a higher error rate. The second effect 
is on the data spatial correlation where the adjacent pixel 
correlation decreases with increasing noise power. Consider 
two univariate populations. Fig. 3-16, with equal variances 
where the first class is corrupted by random additive 
Gaussian noise. After transmission through the scanner, 
according to the properties of W g , emerges with a 
smaller variance than with corresponding classification 

accuracies, P , and P . . Consider two other populations, 

c jt^ c l w 2 

and ( 1)2 with identical variances such that 

Var = Variwg} = Varfu^} * Var{« 2 > (3-23) 

where neither or are affected by random noise. There- 
fore , 
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(3.24) 


and similarly for • At the scanner output, the clas- 

sification accuracies are P . * and Pi ' . Prom the spatial 

c|(ij 1 c ] 

correlation properties of and expressed in (3-24) 

and W it is clear that 
s 


P 

c 




(3-25) 


Does this mean the noisier the data the better? In terms of 
intrinsic classifiability of a population maybe but the 
question is what is being classified. Random noise can 
alter a statistics to the point that it will no longer 
represent the specific class under consideration and in 
fact in a multipopulation environment, the modified 
statistics could approach those of another existing popu- 
lation and thus increase the overall error rate not to 
mention the esthetic degradation of the image caused 
by it. 

Another topic to be considered and defined is the term 
signal-to-noise ratio. It is frequently desirable to 
examine the performance of a system in a variable noise 
content environment. When the subject is the actual data, 
it should be noted that one is already dealing with a 
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noisy signal and, therefore, any additive noise will be in 
addition to the existing quantity. Let R be the noisy 
data, S the signal and N the noise, then 

R(x,y) = S (x,y) + N(x,Y) (3-26) 

The artificial noise N' is added to R to produce R' 

R' (x,y) = R(x,y) + N' (x,y) (3-27) 

2 2 2 2 

The (SNR) * = crR/a^, and SNR = crs/o N , are related by 

2 

a 

{SNR)' = — + SNR (3-28) 

V 

If the noise content of the data is considerably smaller 
than the added noise, then (SNR) 1 * SNR. 

The way to determine the noise power to be added to 
the multispectral data for simulation purposes is open to 
discussion. Consider a frame of data, R(x,y), containing 
M populations. A particular SNR can be specified and 
from that the noise variance derived. The signal variance, 
however, is a pooled average of all the class variances and 
for that matter the given SNR does not hold for any one 
of the populations. Another alternative considered in [64] 
is to measure noise solely on the basis of its variance. 

The definition adopted here is to base the variance 
of the signal on the entire picture frame and in effect 
lump the individual class variances that may be present in 
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the particular data set. The reasoning behind this approach 
is that long before any knowledge is available about the 
population structure of a data set, random noise is already 
added to the signal; therefore, any class-dependent defini- 
tions of SNR would be unrealistic. These considerations are 
primarily applicable to actual data sets. In a highly con- 
trolled simulation environment, however, some or all of the 
above restrictions can be relaxed. For example, noise can 
be added to each class in different quantities in order to 
observe its effects on the classifiability of one particular 
population. 

The next question to be resolved is the location, in 
the MSS spatial model , at which this definition of SNR 
applies. In Fig. 3-1 additive noise could enter both at 
the input and the output of the scanner system. While this 
is a realistic model, from a practical point of view the 
input noise does not limit the system performance so much 
due to the following reasons. First is the fact that other 
noise sources involved; i.e., quantization and detector- 
noise are generally more dominant than any other disturbance 
arising from the atmosphere during normal operating condi- 
tions. Second, and more importantly, is the MSS response to 
a white noise random process. It has been pointed out that 
the variance of the output process is proportional to the 
input adjacent pixel correlation. The variation of W vs. p 
indicates that when the input scene displays little spatial 

'■ - ; r i 
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correlation, the varaiance of the output process is a very 
small fraction of the corresponding input quantity. Let 
f (x,y) ,N (x,y) , f ' (x,y) and W f ' (x,y) be the input random 
process, input additive white noise, the output random 
process and the noise component of the output signal 
respectively , then 

f’ (x,y) = f (x,y)*h(x,y) (3-29) 

N f ' (x,y) = N f (x,y) *h(x,y) (3-30) 

Define (SNR) f = Var{f (x,y) }/Var{N f (x,y) } and 
(SNR) f ' = Var{f * (x,y) }/Var(N f ' (x,y) } . The following in- 
equalities hold 

Var{f 1 (x,y) } < Var{f (x,y) } (3-31) 

Var(N f 1 (x,y)> << Var(N f (x,y) } (3-32) 

hence 

(SNR) f 1 >> (SNR) f (3-33) 

It then follows that the noise component of the output pro- 
cess (prior to quantization and detector noise) is quite 
negligible and for all practical purposes can be neglected. 
Random noise generated in the detection stage of the 
incoming signal is, therefore, the major disturbance factor. 
Having narrowed the noise contribution to one source, the 
logical definition of SNR would be the ratio of MSS output 
variance (negligible noise content) to that of quantization 
and detector noise (N^g) ? i.e. r 
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<SNR) q - Var { f ' (x,y) }/Var{N dg (x,y) } (3-34) 

Note that for a fixed noise power, (SNR) q is always smaller 
than (SNR)^, the input signal— to— noise ratio 

(SNR) ^ = Var{f (x,y) }/Var{N d g (x,y) } (3-35) 

Since 

Var{f (x,y) > = W Var{f (x,y) } < Var{f (x,y) } (3-36) 

s 

hence 

(SNR) q = W s (SNR). (3-37) 
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CHAPTER 4 

Experimental Evaluation of the Parametric 
Multiclass Bayes Error Estimator 

An experimental investigation was carried out to con- 
firm the proper operation of the CSP error estimation 
algorithm described in Chapter 2. In order to satisfactor- 
ily accomplish the task, as much peripheral uncertainty 
as possible must be eliminated so that any deviation from 
the desired result can be traced directly to the methodology 
or the computer codes. This requirement eliminates the use 
of real data which is likely to have characteristics that 
are highly dependent on outside and generally uncontroll- 
able elements. A more satisfactory approach is the 
generation of a completely synthetic data base with known 
and prescribed properties. After the validation process 
has been successfully completed, actual Landsat data 
will be employed and the probability of correct classifi- 
cation for the various populations within that set 
estimated by a count estimator and the CSP estimation 
technique and the results compared. 
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4 . 1 Description of the Data Base 
The generation of a synthetic data base requires con- 
trol of two characteristics; spectral and spatial. Stage I 
simulates M populations with N features each of which has 
a specified multivariate normal density function. 

Let _u and x_ be the desired mean and covariance matrix. 

The following linear transformation on a random vector 
XM5T(0,I) produces 

Y = A X + £ 

where A is the square-root matrix associated with £, i.e., 

T 

A A = 

the number of samples per class is generally decided by the 
examination of histograms as a check for normality of the 
statistics. No attempt was made to incorporate the 
geometrical shape as a factor in generating the random 
field and any specified number of lines and columns in a 
rectangular array of points can be produced. Statistically, 
this data set represents an 'ideal' data set except for 
the lack of any serial correlation in Y caused by the same 
property in X. The almost zero pixel-to-pixel correlation 
is immaterial due to the fact that the Bayes spectral class- 
ifier and the CSP error estimators do not utilize any spa- 
tial correlation information available for the data. A 
schematic diagram of the entire data base simulation and 


With the probability of correct classification of the 
various populations in a data set as the prime performance 
index , the M-class, N-feature Bayes error estimator 
developed in Chapter 2 comprises the basic tool by which the 
MSS system model is analyzed. A comprehensive set of test 
procedures is required to verify the proper operation of 
this algorithm and to observe its response to variable 
operating states. 

The merits of a simulated data base were discussed in 
sec. 4.1. The question raised now is how to select the 
features associated with such a base. In addressing this 
question, the following should be kept in mind. The main 
purpose here is the validation of the error estimation model 
independently of other system components. Therefore, the 
test populations need not and, indeed, cannot be 'representa- 
tive' of the classes found in the multispectral data. Hence, 
any conclusion drawn from the results serves only to 
evaluate the performance of the algorithm. In generating 
the simulated data, however, certain general guidelines 
were followed. 

1. The minimum number of populations should be 3 
and the minimum number of dimensions preferably 
be the same. 
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2. The structure of the class statistics should lend 
itself to logical and simple manipulation of its 
parameters . 

3. A separability measure should be defined that would 
reflect the changes in population parameters. 

As an initial condition, three classes arranged in a 
simplex are considered, Pig. 4-2. This arrangement keeps 
the computation time low thus allowing the examination of 
the algorithm's performance for fine sampling grids, allows 
systematic parameter variation by assigning the mean vectors 
to different coordinates along their respective feature axis 
and maintains a geometrical insight as the population 
statistical structure is varied. Two basic categories are 
considered: (a) constant covariance matrices, variable 

mean vectors; and (b) constant mean vectors, variable 
covariance matrices. Because of the multiplicity of 
parameters describing the class statistics, it would be 
desirable to have a separability criterion that would lump 
all of the variables together and generate a single number 
after each change. 

There are a number of separability measures to choose 
from. Bhattacharyya distance (B-distance) and divergence 
are the most notable. The former criterion will be adopted 
here mainly because it provides an upper bound for the error 
probability which can be compared with other error estimators 
examined here. Let the two populations and be 
distributed according to N ( and N(_m 2 ,E^). Then J, the 
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B-distance, and P^, the resulting upper bound on the proba- 
bility of error, are given by [41]. 
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(4-4) 


The Chernoff bound, C fi , in (4-4) applies to a binary set but 
it can be generalized by a pairwise summation. The result- 
ing bound, however, is not adequately tight. 

The only practical reference against which the results 
of the CSP error estimation algorithm can be compared is 
the Monte-Carlo (MC) type simulation of the population 
statistics using pseudorandom numbers, assignment of 
samples to their respective categories by a Bayes classifier 
and finally a count estimator to provide the classification 
accuracy. A criterion however, needs to be defined if the 
results of the comparison are to be meaningful. One such 
measure is the equality of the total number of samples used 
in the estimation process; i.e., 

number of samples |MC = (n+l) W (4-5) 

where the right side of (4-5) is the total number of cells 
in the sampled space of 0. 
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4.2.1 Fixed Mean, Variable Scatter 
In a feature space of multivariate nature the multi- 
plicity of dimensions generates a vast number of possible 
combinations of parameters to manipulate. Even for the 
moderate size case proposed here there are 3 variances, 

9 covariances and 3 mean values capable of taking on a 
continuum of an infinite number of states. Therefore, a 
certain degree of arbitrariness must be employed in select 
ing the initial values and their subsequent variations. 

The approach selected here is the adoption of one variable 
statistic against a fixed background in the form of two 
static populations. The fixed statistic is selected after 
examination of the correlation matrices obtained for dif- 
ferent types of ground cover, [65] ■ An attempt was made 
to choose correlation structures that would approximately 
represent two typical cases, albeit crudely. As pointed 
out before, whether this is true or not has little bearing 
on the results of this validation procedure. This choice 
simply displays an attempt to be as realistic as possible. 
Assuming that the set of three spectral bands is composed 
of two in the visible and one in the near-infrared, the 
fixed correlation matrices are given by, 
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5f. 


1 0.8 0 
1 0.1 


Sf. 


(4-6) 


1 0.94 0.15 

1 0.05 


The class with the variable scatter is specified by a 
choice of 4 different across-band correlation values rang- 
ing from a low of 0.15, medium of 0.45, medium high of 0.75 
and a high of 0.95. The permutation of these four numbers 
taken 3 at a time generates 24 different cases out of 
which 13 result in invalid non-positive definite matrices. 
For each remaining case, an average B-distance J is com- 
puted and the 11 permissible combinations are tabulated in 
the order of increasing separability. Table 4-1. J% is the 
value of J normalized to the highest J in the table 
and s. . is the channel i and j correlation coefficient. 

The means are fixed at 0 . 7 cr on each axis wi th a = 1 . The 
grid size for the CSP error estimation technique ranged from 

4 to 14 cells per axis with an increment of 1 which is 
3 3 

equivalent to 4 to 14 samples for the corresponding MC 
estimator. For each of the 11 cases outlined in Table 4-1, 
there exists 3 plots . The first two show the variation of 
the CSP (MC) error estimator vs. grid (sample) size and the 
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third plot shows the variance of the error estimate for the 
two aforementioned techniques. Each plot is accompanied by 
a table of values. Throughout sec. 4.2.1, 'case i' corre- 
sponds to the particular separability of rank i (from the 
top) of Table 4-1 and NS is the number of boundary cells 
as a percentage of the inside cells, and G g , the grid size 
is the number of cells per axis. 

The results of the variable scatter geometry provide 
the basic understanding of the potentials and operating 
principles of this error estimation technique and exhibit 
many properties universal to this algorithm. The first and 
probably the most important item to be explored is the 
variation and dependence of the estimate on the grid size. 
This relationship is particularly crucial due to the fact 
that although there is a theoretical convergence estab- 
lished, the rate of convergence determines the feasibility 
of implementation of this technique as a viable alternative 
to other data dependent algorithms. This is especially 
true since the number of cells within the grid bears an 
exponential relationship with the dimensionality of the 
data. Examination of the CSP error estimator vs. grid size 
plots quickly disposes of this concern. The pattern 
exhibited throughout is one of a rapid climb to a steady 
state value and oscillations of small magnitude around it. 
The rapid convergence is best demonstrated in Case 6 . 

Where the estimate of the overall classification accuracy 
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at the smallest grid size was off 7.1% from its final value, 
it jumped 6.2% by incrementing the grid size by one step 
to 5 cells per axis and from then on gained only 0.9% to 
level off at 72.9% for 14 cells per axis. In terms of the 
total number of cells involved, the initial rise of 6.2% 
was gained by an increase of 61 cells while the addition of 
2619 more cells improved the estimate by only 0.9%. Similar 
behavior is observed in Case 2 where the one step rise of 
7.8% was accompanied by a 10 step rise of 1.1%. These ef- 
fects are evident in all 11 cases with varying degrees 
of intensity. On the average the initial rise of 5.14% 
was followed by a 1.78% increase toward the final value. 

This property is remarkable in view of the performance 
of various sampling techniques. For a 3 dimensional grid 
with 5 cells per axis, there are a total of 125 points 
involved which provide an estimate of aforementioned 
quality. The performance of the MC technique with that 
small a sample size is totally inadequate. In fact, gener- 
ating the required Gaussian data base with 125 samples is 
itself very difficult. Fig. 4-3 demonstrates the devia- 
tion from normality of the statistics for small sample size 
while for comparison purposes, a corresponding histogram 
using 2744 (14 3 ) samples is shown in Fig. 4-4. It is, 
therefore, clear that small sample behavior of the CSP 
technique is very superior to small sample size behavior 
of the Monte Carlo technique. It can be argued, however, 
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TABLE 4- 1 TEST CASES ARRANGED BY INCREASING SEPERABILITY. 
VARIABLE SCATTER. 


S 12 

S 13 

S 23 

J 

J% 

C B % 

0.75 

0. 15 

0.45 

0.50 

29 

39.0 

0.45 

0. IS 

0.75 

0.52 

30 

40.6 

0.75 

0.45 

0. 15 

0.54 

31 

41.6 

0.15 

0.45 

0.75 

0.58 

34 

43.9 

0.45 

0.75 

0. 15 

0.59 

34 

44.3 

0.15 

0.75 

0.45 

0.60 

35 

44.7 

0.45 

0.15 , 

0.95 

1.48 

86 

69.4 

0.95 

O. 15 

0.45 

1.52 

88 

69.5 

0.45 

0.95 

0. 15 

1.58 

91 

'/Kir. 6 

0. 15 

0.95 

0.45 

1.58 

92 

70.7 

0.95 

0.45 

0.15 

1.72 

100 

72.1 
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TABLE 4- 2 PERCENT CLASSIFICATION ACCURACIES OBTAINED BY CSP AND KC 
ESTIMATION TECHNIQUES. CASE 1 



Pi 

K 

P 

I 

A 

P 

I 

vs 

P 



c| 

C 

l“2 

c 

' w 3 

C 

V 

G 

s 

CSP 


CSP 

MC 

CSP 

MC 

CSP 

MC 

4 

54.9 

71.9 

69-6 

75.0 

70.0 

81.3 

64.8 

76.0 

S 

64. S 

68.6 

72.2 

74.4 

73.3 

81.0 

70.1 

74.7 

6 

63.6 

69.9 

71.0 

75.0 

73.1 

78. 1 

69.3 

74.3 

7 

65.6 

69.4 

71.4 

73.8 

76.3 

76.9 

71.1 

73.4 

S 

66.3 

64.3 

69.4 

73.6 

76.5 

76.2 

70.8 

71.3 

9 

68.2 

68.4 

70.3 

76.7 

75.9 

75.9 

71.5 

73.7 

10 

69.3 

66.6 

71.7 

74.4 

77.7 

75.3 

72.9 

72. 1 

11 

68.4 

69.1 

72.8 

73.8 

76.8 

75.6 

72.7 

72.9 

12 

68.8 

67.6 

73.7 

76.9 

76.6 

76.4 

73.0 

73.6 

13 

68.9 

69.9 

74.2 

74.2 

76.6 

77.4 

73.2 

73.8 

14 

68.6 

69.3 

74.3 

74.1 

75.9 

76.7 

72.9 

73.4 


TABLE 4- 3 PERCENT CSP AND MC STANDARD DEVIATIONS ACHIEVED FOR CLASS 1. 


G 

S 

CSP 

MC 

V 

4 

4.3 

5.7 

99.9 

5 

2.6 

4,1 

64-4 

6 

3.0 

3.1 

68.8 

7 

2.6 

2.5 

51.3 

8 

2.2 

2.0 

52.8 

9 

1.9 

1.7 

41.5 

10 

1.7 

1.4 

41.5 

11 

1.5 

1.3 

34.2 

12 

1.4 

1.1 

35.3 

13 

1.3 

1-0 

30.0 

14 

1.2 

0.9 

29.8 
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FIG. 4-5 CSP CLASSIFICATION ACCURACY ESTIMATE VS. GRID SIZE. 
£« VARIABLE SCATTER. CASE 1 
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t*§ FIG. 4-7 MC AND CSP ERROR ESTIMATE STANDARD DEVIATIONS. 
5| # VARIABLE SCATTER. CASE 1 
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TABLE 4- 4 PERCENT CLASSIFICATION ACCURACIES OBTAINED BY CSP AND HC 
ESTIMATION TECHNIQUES. CASE 2 

#1 Pi £ | P 

c|tii^ cjo^ c l (l) 3 c 


G 

s 

CSP 

MC 

CSP 

MC 

CSP 

MC 

CSP 

MC 

4 

48.7 

68.8 

68.8 

75.0 

71.4 

89.1 

63.0 

77.6 

5 

67.2 

67.8 

71.7 

71.9 

73.6 

80.2 

70.8 

73.3 

6 

61.1 

65.3 

70.9 

73.5 

75.8 

81.6 

69.3 

73.5 

7 

65.3 

66.7 

71.4 

74.1 

77.7 

78.1 

71.5 

72.9 

8 

64.7 

63.4 

70.3 

72.5 

78.4 

78.1 

71.2 

71.3 

9 

65. 1 

65.0 

70.6 

76.0 

78.1 

77.9 

71.3 

73.0 

10 

64.3 

61 .6 

72.1 

74.0 

78.5 

77.7 

71.7 

71.1 

11 

63.2 

66.4 

72.0 

73.8 

78.4 

77.8 

71.2 

72.7 

12 

64.1 

64.7 

73.1 

75.1 

78.3 

78.0 

71,8 

72.6 

13 

64.4 

66.0 

73.3 

73.5 

78.0 

79.3 

71.9 

72.9 

14 

64.5 

63.6 

73.4 

74.1 

77.8 

78.5 

71.9 

72.1 


TABLE 4- 5 PERCENT CSP AND HC STANDARD DEVIATIONS ACHIEVED FOR CLASS 1. 


G 

s 

CSP 

MC 

V 

4 

3.3 

6.0 

74.0 

5 

2.5 

4.3 

40.5 

6 

2.2 

3.3 

41.8 

7 

1.5 

2.6 

30.0 

8 

1.2 

2.1 

31.7 

9 

1.2 

1.8 

25. 1 

10 

1.4 

1.5 

26.0 

11 

1.4 

1.3 

21.2 

12 

1.1 

1.1 

21.2 

13 

0.7 

1.0 

18.2 

14 

0.8 

0.9 

18.2 
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FIG. 4-8 CSP CLASSIFICATION ACCURACY ESTIMATE VS. GRID SIZE. 
VARIABLE SCATTER. CASE 2 
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4-A MC CLASS I F I CRT I ON ACCURACY ESTIMATE VS. SAMPLE SIZE. 
VARIABLE SCATTER. CASE 2 




FIG. 4-10 MC AND CSP ERROR ESTIMATE STANDARD DEVIATIONS. 
VARIABLE SCATTER. CASE 2 
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TABLE 4- 6 PERCENT CLASSIFICATION ACCURACIES OBTAINED BY CSP AND MC 
ESTIMATION TECHNIQUES. CASE 3 



A 

P 

c 


a 

P 

c 

! l“2 

A 

P 

c 

h 

/V 

P 

i 


G 

s 

CSP 

MC 

CSP 

MC 

CSP 

MC 

CSP 

MC 

4 

62.4 

73.4 

65.7 

73.4 

72.6 

82.8 

66,9 

76.6 

5 

66.8 

77.7 

67.5 

71.9 

73.7 

77.7 

69.4 

75.8 

6 

68.1 

78.1 

67.6 

70.4 

73.9 

78. 1 

69.9 

75.5 

7 

71.5 

80.9 

67.8 

71.6 

73.8 

71.2 

71.0 

76.5 

$ 

73.3 

73.6 

67.4 

70.0 

75.7 

77.3 

72.1 

73.6 

9 

76.0 

75.9 

67.8 

74.5 

75.5 

75.6 

73.1 

75.3 

10 

79.4 

74.0 

69.0 

70.4 

76.8 

75.2 

75.1 

73.2 

11 

78.6' 

75.2 

70.2 

71.5 

76.3 

76.2 

75.0 

74.3 

12 

78.3 

75.0 

71.1 

73.8 

76.3 

76.9 

75.2 

75.2 

13 

78.8 

77.1 

71.7 

71.9 

76.7 

77.1 

75.8 

75.4 

14 

78.0 

75.8 

71.7 

70.9 

76.3 

75.9 

75.3 

74.2 


TABLE 4- 7 PERCENT CSP AND MC STANDARD DEVIATIONS ACHIEVED FOR CLASS 1. 


G 

s_ 

CSP 

MC 

V 

4 

4.0 

5.2 

91.4 

5 

3.4 

3.7 

64. 1 

6 

3.0 

2.8 

58.5 

7 

2.6 

2.2 

46.4 

8 

2.2 

1.8 

43.7 

9 

1.4 

l.S 

35.4 

10 

1.2 

1.3 

34.4 

11 

1.1 

1.1 

28.7 

12 

1.0 

1.0 

27.9 

13 

0.9 

8.9 

24.0 

14 

0.8 

0.8 

23.8 
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FIG. 4-11 CSP CLASSIFICATION ACCURACY ESTIMATE VS. GRID SIZE. 
VARIABLE SCATTER. CASE 5 
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TABLE 4“ 8 PERCENT CLASSIFICATION ACCBRACIES OBTAINED BY CSP AND MC 
ESTIMATION TECHNIQUES. CASE 4 



A 

P 

c 

K 

A 

P 

c 

|u 2 

rt. 

p c| 

l“3 

A 

P 

c 

t 

G 

s 

CSP 

M2 

CSP 

MC 

CSP 

MC 

CSP 

MC 

4 

58.3 

67.2 

68.4 

71.9 

73.0 

90.6 

66.5 

76.6 

5 

63.8 

66.9 

72.3 

71.9 

74.5 

81.0 

70.2 

73.3 

6 

62.4 

66.3 

70.4 

73.0 

75.5 

81.6 

69.4 

73.6 

7 

66.1 

66.0 

71.1 

73.5 

77.9 

79.3 

71.7 

72.9 

8 

65.3 

65.5 

69.3 

71.9 

78.6 

78.7 

71.1 

72.0 

9 

64.8 

65.3 

70.4 

75.3 

78.0 

78.7 

71.1 

73.1 

10 

64.2 

62.7 

71.6 

72.4 

78.9 

77.7 

71.6 

71.0 

11 

65.6 

65.6 

71.4 

72.4 

79.2 

78.3 

72. 1 

72. 1 

12 

67.2 

65.5 

72.5 

74.5 

79,7 

78.8 

73. 1 

72.9 

13 

66.6 

67.1 

72.6 

72.8 

79.7 

80.5 

72.9 

73.5 

14 

66.1 

64.4 

73.3 

73.2 

79. 1 

79.1 

72.8 

72.2 


TABLE 4- 9 PERCENT CSP AND MC STANDARD DEVIATIONS ACHIEVED FOR CLASS 1. 


G 

S 

CSP 

MC 

V 

4 

2.5 

5.9 

48.6 

5 

1.7 

4.2 

32.0 

6 

1.9 

3.2 

27.7 

7 

1.2 

2.5 

23.4 

8 

1.1 

2.1 

23.3 

9 

0.7 

1.8 

19.0 

10 

0.6 

1.5 

18.0 

ii 

0.5 

1.3 

15.6 

12 

0.6 

1.1 

15.6 

13 

0.5 

1,0 

13.9 

14 

0.4 

0.9 

13.3 
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FIG. 4-15 MC CLASSIFICATION ACCURACY ESTIMATE VS. SAMPLE SIZE. 
VRRI ABLE SCATTER. CASE 4 
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FIG. 4-16 MC AND CSP ERROR ESTIMATE STANDARD DEVIATIONS. 
VARIABLE SCATTER. CASE 4 


t X 



127 


TABLE 4-10 PERCENT CLASSIFICATION ACCURACIES OBTAINED BY CSP AND KC 
ESTIMATION TECHNIQUES. CASE 5 



P 

| 

P 


P 

1 

A 

P 



c 

K 

c 

l“2 

c 

l u 3 

c 

1 

G 

s 

CSP 

M2 

CSP 

MC 

CSP 

MC 

CSP 

MC 

4 

68.7 

65.6 

66.7 

73.4 

74.2 

85.9 

67.2 

75.0 

S 

70.3 

69.4 

68.9 

71.1 

74.6 

80.2 

71.3 

73.6 

6 

71.7 

75.5 

67. 1 

73.5 

73.5 

81.1 

70.8 

76.7 

7 

71.9 

73.5 

67.3 

70.7 

75.4 

76.5 

71.5 

73.6 

S 

73,3 

70.0 

67.0 

72.3 

75.5 

78.3 

71.9 

73.6 

9 

72,6 

73.3 

68.1 

74.3 

76.8 

77.2 

72.5 

74.9 

10 

71.9 

69.3 

68.5 

70.6 

77. Q 

76.4 

72.6 

72.1 

11 

72.4 

72.6 

69.3 

71.2 

77.8 

77.1 

73.2 

73,6 

12 

71.7 

72.0 

71.4 

73.7 

78.1 

77.2 

73.7 

74.3 

13 

73.3 

75.2 

71.7 

71.4 

77.6 

78.8 

74.2 

75.1 

14 

73.0 

73.0 

71.3 

70.2 

77.1 

76.7 

73.8 

73.3 


TABLE 4-11 PERCENT CSP AND MC STANDARD DEVIATIONS ACHIEVED FOR CLASS 1. 


G 

s 

CSP 

MC 

V 

4 

3.7 

5.5 

54.5 

5 

3.2 

4.0 

43.9 

6 

2.6 

3.0 

37.0 

7 

2.2 

2.4 

30.8 

8 

1.6 

2.0 

28.3 

9 

1.4 

1.6 

24.5 

10 

1.3 

1.4 

22.6 

11 

1.4 

1.2 

19.7 

12 

1.3 

1.1 

18.8 

13 

1.0 

0.9 

17.1 

14 

1.0 

0.8 

16.4 


NO OF CELLS PER 


4-17 CSP CLASS I F I CAT ION ACCURACY E 
VARIABLE SCATTER. CASE S 



equivalent mc grid size 


4-13 MC CLASSIFICATION ACCURACY ESTIMATE VS. SAMPLE SIZE 
VARIABLE SCATTER. CASE 5 










FIG. A — 1 ^ MC AND CSP ERROR ESTIMATE STANDARD DEVIATIONS. 
VARIABLE SCATTER. CASE 5 
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TABLE 4-32 PERCENT CLASSIFICATION ACCURACIES OBTAINED BY CSP AND KC 
ESTIMATION TECHNIQUES. CASE 6 

^ A Ai r* 

Pi Pi Pi p 

c|( 0 2 r c[io 3 *C 


i 

CSP 

MS 

CSP 

MC 

CSP 

MC 

CSP 

MC 

4 

59.9 

67.2 

64.3 

71.9 

73.2 

89.1 

65.8 

76.0 

5 

67.6 

64.5 

71. S 

71.1 

77.1 

80.2 

72.1 

71.9 

6 

65. 6 

69.4 

69. 9 

73.5 

75.5 

82.1 

70,3 

75.0 

7 

66.5 

67.9 

70.6 

72.5 

77.6 

79.3 

71.6 

73.3 

8 

65.6 

64.3 

68.4 

72.3 

77.3 

79.5 

70.4 

77.0 

9 

65.9 

65.6 

69.4 

7S.3 

77.4 

78.3 

70.9 

73.1 

10 

66. 5 

65.3 

70.2 

71.1 

79,0 

77.9 

71.9 

71.5 

11 

67,8 

67.9 

70.5 

71.9 

79.7 

78.2 

72.7 

72.7 

12 

66.7 

67.4 

72.2 

73.7 

79.7 

78.8 

72.8 

73.3 

13 

66.6 

70.2 

72.0 

72.0 

79.1 

80.6 

72.5 

74.3 

14 

67.8 

67.9 

72.6 

71.2 

78.3 

78.9 

72.9 

72.7 


TAPLE 4-13 PERCENT CSP AND MC STANDARD DEVIATIONS ACHIEVED FOR CUSS 1. 


G 
s 

CSP 

MC 

v 

4 

2.5 

5.9 

41.2 

5 

2.1 

4.2 

29.2 

6 

1.9 

3.2 

31.8 

7 

1.5 

2.5 

23.5 

8 

1.4 

2.1 

23.0 

9 

1.0 

1.7 

19.1 

10 

1.2 

1.5 

19.3 

11 

0.8 

1.3 

16.3 

12 

0.8 

1,1 

15.5 

i3 

0.7 

1.0 

13.6 

14 

0.6 

0.9 

13.1 
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CHERNOFF BOUND= 44.7 


NO OF CELLS PER AXIS 


CLRSS 1 
CLASS 2 
CLASS 5 
OVERALL 
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FIG. 4-20 CSP CLASSIFICATION ACCURACY ESTIMATE VS. GRID SIZE. 
VARIABLE SCATTER. CASE 6 






NO OF CELLS PER AXIS 

FIG. 4-22 MC- FIND CSP ERROR ESTIMATE STANDARD DEVIATIONS. 
VARIABLE SCATTER. CASE 6 
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TABLE 4-14 PERCENT CLASSIFICATION ACCURACIES OBTAINED BY CSP AND MC 
ESTIMATION TECHNIQUES. CASE 7 



A 

P 

c 

K 

A 

P 

c 

l“2 

A 

P 1 
c| 

l“3 

P 

c 

t 

G 

s 

CSP 

MC 

CSP 

MC 

CSP 

MC 

CSP 

MC 

4 

87.1 

98.4 

69.6 

81.3 

77.1 

87.5 

77.9 

89.1 

S 

95.8 

99.9 

72.7 

71.9 

81.4 

79.3 

83.3 

83.7 

6 

. 95.5 

98.5 

73.1 

73.5 

77.7 

80.6 

82.1 

84.2 

7 

97.8 

98.5 

73.3 

75.3 

77.4 

79.3 

82.8 

84.4 

8 

97.7 

98.3 

71.2 

74.2 

78.6 

77.3 

82.5 

83.3 

9 

98.8 

97.7 

72.4 

77.0 

81.1 

79.4 

84.1 

84.7 

10 

98.6 

98.6 

73.6 

74.6 

81.2 

77,5 

84.4 

83.6 

11 

99.0 

98.2 

73.7 

74.4 

82.2 

78.5 

84.9 

83.7 

12 

98.8 

98.3 

75.6 

75.5 

80.9 

79.7 

85. 1 

84.5 

13 

98.9 

98.4 

75.1 

74.9 

81.0 

79.6 

85.0 

84.3 

14 

98.9 

98.3 

75.2 

74.7 

80.3 

79.3 

84. S 

84.1 


TABLE 4-15 PERCENT CSP AND MC STANDARD DEVIATIONS ACHIEVED FOR CLASS 1. 


G 

s 

CSP 

MC 

V 

4 

3.3 

1.2 

27.0 

5 

2.2 

0.9 

23.5 

6 

1.2 

0.7 

23.0 

7 

1.4 

0,5 

19. 1 

8 

1,4 

0.4 

18.5 

9 

0.9 

0.4 

16.7 

10 

0.9 

0.3 

15.9 

11 

0.8 

0.3 

14.7 

12 

0.5 

0.2 

13.9 

13 

0.5 

0.2 

13.2 

14 

0.5 

0.2 

12.5 


NO OF CELLS PER AXIS 


4-25 CSP CLASSIFICATION ACCURACY ESTIMATE VS. GRID SIZE 
VARIABLE SCATTER. CASE 7 
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12 


15 


14 
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SIZE 


ESTIMATE VS. SAMPLE SIZE. 
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TABLE 4-16 PERCENT CLASSIFICATION ACCURACIES OBTAINED BY CSP AND MC 
ESTIMATION TECHNIQUES. CASE 8 



P 


A 

P 

1 

A 

P 

I 

A 

P 



C 

K 

c 

l“2 

c! 

l“3 

c 

* 


CSP 

MC 

CSP 

MC 

CSP 

MC 

CSP 

MC 

4 

86.0 

92.2 

69.6 

81.3 

68.5 

89.1 

74.7 

87.5 

5 

95.6 

97.5 

74.1 

87.2 

75.1 

70.7 

81.6 

81.0 

6 

94.2 

98.0 

73.4 

76.0 

74.6 

79.6 

80.7 

84.5 

7 

97.5 

97.5 

74.4 

76.9 

79.2 

76.9 

83.7 

83.7 

8 

97.4 

96.3 

72.5 

77.1 

78.8 

73.6 

82.9 

82.3 

9 

97.7 

97.3 

73.3 

79.1 

80.0 

77.1 

83.7 

84. S 

10 

97.8 

98.1 

75.0 

76.7 

80.5 

74.8 

84.4 

83.2 

11 

98.4 

97.5 

75.7 

76.2 

78.4 

76.5 

84.2 

83.4 

12 

98.3 

97.5 

76.6 

78.4 

77.7 

78.2 

84.2 

84.7 

13 

98.1 

96.8 

76.6 

76.6 

78.3 

78.7 

84.3 

84.0 

14 

98.3 

97.0 

76.9 

77.1 

78.1 

75.9 

84.4 

83.3 


TABLE 4-17 PERCENT CSP AND MC STANDARD DEVIATIONS ACHIEVED FOR CLASS 1. 


G 

s 

CSP 

MC 

V 

4 

2.3 

1.8 

41.8 

5 

2.4 

1.3 

24.8 

6 

1.8 

0.9 

28.6 

7 

1.5 

0.8 

22.2 

8 

1.2 

0.6 

22.4 

9 

1.0 

0.5 

18.4 

10 

0.8 

0.4 

19.1 

11 

0.8 

0.4 

15.9 

12 

0.9 

0.3 

19.0 

13 

0.5 

0.3 

14.9 

14 

0.3 

0.3 

17.5 


NO OF CELLS PER RXIS 


4-26 CSP CLASS IF I CRT I ON ACCURACY ESTIMATE VS. GRID SIZE 
VARIABLE SCATTER. CASE 8 





1- CLASS 1 

2- CLASS 2 

3- CLASS 5 
□- OVERALL 

i 



CHERNOFF BOUND = 6A . S 
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TABLE 4-20 PERCENT CLASSIFICATION ACCURACIES OBTAINED BY CSP AND MC 
ESTIMATION TECHNIQUES. CASE 10 



A 

P 

! I“ X 

P 

j 

P 

l 

A 

TD 



C 

c 

l«2 

C 

h 

C 

c 

s 

CSP 

MC 

CSP 

MC 

CSP 

MC 

CSP 

MC 

4 

87.0 

99.0 

69.6 

7570' 

7 3.5 

"SCF 

76.7 

88.5 

5 

96.3 

98.3 

73.7 

71.1 

80.9 

81.0 

83.6 

83.5 

6 

95.7 

99.5 

73.3 

75.5 

80.7 

83.2 

83.2 

86.1 

7 

97.7 

99.9 

74.6 

72.2 

81.7 

80.2 

84.7 

84.2 

8 

98.0 

99.4 

72.3 

71.7 

82.9 

80.8 

84.4 

84.0 

9 

99.1 

98.6 

72.5 

75.9 

81.9 

81.2 

84.5 

85.2 

10 

99.0 

98.9 

74.0 

73.7 

81.8 

81.0 

84.9 

84.5 

11 

99.0 

99.2 

74.7 

73.2 

80.8 

79.9 

84.9 

84.1 

12 

99.0 

98.5 

75.1 

75.1 

81.3 

82.4 

85.1 

85.3 

13 

99.2 

98.7 

74.7 

72.6 

81.8 

83.3 

85.2 

84.9 

14 

99.3 

98.7 

74.6 

73.9 

82.4 

81.6 

85.4 

84.7 


TABLE 4-21 PERCENT CSP 

AND MC STANDARD DEVIATIONS 

ACHIEVED FOR CLASS 1 . 


G 

_s. 

CSP 

MC 



4 

2.5 

1.0 

33.3 


5 

2.3 

0.7 

20.0 


6 

7 

8 
9 

1.8 

1.3 

1.4 
1.0 

0.6 

0.4 

0*4 

0.3 

22.3 

18.2 

18.9 

15.0 


10 

0.8 

0.3 

15.5 


11 

0.6 

0.2 

13.2 


12 

0.6 

0.2 

13.3 


13 

0.6 

0.2 

11.6 


14 

0.6 

0.2 

11.5 


ro 



FIG. 4-52 CSP CLASSIFICATION ACCURACY ESTIMATE VS. GRID SIZE. 
VARIABLE SCATTER. CASE 10 
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TABLE 4-22 PERCENT CLASSIFICATION ACCURACIES OBTAINED BY CSP AND MC 
ESTIMATION TECHNIQUES. CASE 11 



P | 

1 

P 

1 

p 

1 

A 

P 



cj 

l“l 

c 

h 

c 

l“3 

C 

* 

G S 

CSP 

MC 

CSP 

MC 

CSP 

MC 

CSP 

MC 

4 

87.5 

98.4 

69.4 

79.7 

78.5 

92.2 

78.5 

90.1 

5 

98.3 

99.2 

73.7 

74.4 

80.9 

83.5 

83.6 

85.7 

6 

95.9 

99.5 

71.6 

76.0 

80.5 

85.2 

82.7 

86,9 

7 

98.7 

99.9 

73.0 

74.7 

81.5 

82.4 

84.4 

85.7 

8 

98.6 

99.8 

72.2 

73.8 

82.7 

83.5 

84.5 

85.7 

9 

99.4 

99.7 

72.4 

77.9 

83.3 

84.2 

85.0 

87.3 

10 

99.4 

99.9 

73.6 

75.1 

84.5 

82.9 

85.8 

86.0 

11 

99.6 

99.5 

74.3 

74.1 

84.2 

82.5 

86. 1 

85.4 

12 

99.6 

99.8 

75.6 

76.4 

84.0 

84.7 

86.4 

87.0 

13 

99.7 

99.7 

75.7 

75.2 

83.9 

84.9 

86.5 

86.6 

14 

99.7 

99.7 

75.9 

75. 1 

83.8 

83.4 

86.5 

86.1 


TABLE 4-23 PERCENT CSP AND MC STANDARD DEVIATIONS ACHIEVED FOR CLASS I. 


G 

s 

CSP 

MC 

V 

4 

2.3 

0.7 

25.0 

5 

2.3 

0.5 

20.0 

6 

1.8 

0.4 

18.8 

7 

1.6 

0.3 

16.1 

8 

1.4 

0.2 

16.6 

9 

1.2 

0.2 

14.8 

10 

1.3 

0.2 

16.8 

11 

0.9 

0.1 

15.2 

12 

8.7 

0.1 

16.0 

13 

0.3 

0.1 

15.1 

14 

0.7 

0.1 

15.3 





NO OF CELLS PER AXIS 

FIG. 4-35 CSP CLASS I F 1 COT I ON ACCURACY ESTIMATE VS. GRID SIZE. 
VARIABLE SCATTER. CASE 11 
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NO OF CELLS PER AXIS 

FITS. 4-57 MC FIND CSP ERROR ESTIMATE STANDARD DEVIATIONS . 
VARIABLE SCATTER. CASE 11 


TABLE 4-24 COMPARISON OF CSP AND MC PERCENT CLASSIFICATION ACCURACY. 
VARIABLE SCATTER. 


Pi Pi Pi p 

c I c|oi 2 C I (1)3 c 


Case 

CSP 

MC 

CSP 

MC 

CSP 

MC 

CSP 

MC 

l 

68.6 

69.3 

74.3 

74.1 

75.9 

76.7 

72.9 

73.4 

2 

64.5 

63.6 

73.4 

74.1 

77.8 

78.5 

71.9 

72.1 

3 

7S.0 

75.8 

71.7 

70.9 

76.3 

75.9 

75.3 

74.2 

4 

66.1 

64.4 

73.3 

73.2 

79.1 

79.1 

72.8 

72.2 

5 

73.0 

73.0 

71.** 

70.2 

77.1 

76.7 

73.8 

73.3 

6 

67. S 

67.9 

72.6 

71.2 

78.3 

78.9 

72.9 

72.7 

7 

98.9 

98.3 

75.2 

74.7 

80.3 

79.3 

84.8 

84.1 

8 

98.3 

97.0 

76.9 

77.1 

78.1 

75.9 

84.4 

83.3 

g 

99.3 

98.8 

1 

74.5 

82.7 

81.1 

85.6 

84.8 

10 

99.3 

98.7 

74.6 

73.9 

82.4 

81.6 

85.4 

84.7 

11 

99.7 

99.7 

75.9 

75.1 

83.8 

83.4 

86.5 

86.1 
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that in any HC simulation process such small sample sizes 
are not used anyway, but it is precisely the reasons out- 
lined above that makes employment of a large data base 
mandatory. This requirement would not be overly restrictive 
with unlimited computation time. Since this is generally 
not the case, adequate performance with small sample 
sizes becomes a significant property. The sufficiency 
of small grid sizes for adequate performance was expected 
considering the structure of the sampling grid. In sec. 2.5 
this matter was discussed and it was pointed out that the 
grid is a partitioned hypercube with each edge 2 /n 
long. Therefore, small values of n are capable of sampling 
substantial portions of the feature space. 

The next systematic feature in the variation of the 
CSP estimator is a periodic oscillations for each increment 
of the grid size. This phenomenal, like most other proper- 
ties of this estimator, is the product of the geometry of 
the problem. As described before, the rule governing the 
assignment of sampling cells to a particular domain can 
potentially exclude (include) the entire cell even though 
on3.y a portion of it lies outside (inside) . The grid, being 
a dynamic structure, interacts with the fixed boundaries to 
produce the oscillatory character of the estimate -uhe manner 
and intensity of which depends on the shape of the boundaries 
involved. Among all the cells that are located around the 
contours of there are always some excluded from the 


157 


inside domain but the centers of which are close enough to 
the boundary such that one increment in the grid size would 
move them to the inside. This outside- to- inside shift would 
turn an underestimating grid to an overestimating one. 

The size of this step depends on the number of cells capable 
of making this shift. It is not hard to see that with a 
grid composed of elements with linear features, the worst 
case occurs when the boundaries themselves are linearly 
structured. In fact, when the feature space is divided 
by a set of hyperplanes, this periodic cycle can take on 
substantial amplitude and hence provide a worst case 
situation for this algorithm. This is in general a very 
minor limitation due to the fact that actual remotely 
sensed data, and most data in general where the information 
itself is a realization of a stochastic process, are 
unlikely to be optimally classified into region bounded 
by hyperplanes. Estimation of the numerical values of the 
estimates for various cases here shows that after a 
steady state value has been reached, the magnitude of the 
oscillation peaks are well within 1 percent. 

The variation of the MC estimate with the sample size 
exhibits less recognizable features mainly due to the under- 
lying randomness of the process. What is particularly 
different is the absence of the initial rise of the 
classification accuracy estimate. This observation should 
be viewed with caution, however, due to the small sample 
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sizes involved. In order to use the results of these 
estimates in a conclusive manner, any comparisons made 
should be restricted to samples of greater than 1000 
(equivalent to 10 cells per axis) which in that range the 
estimate exhibits an adequately small variance. 

One topic yet unexplored is how close is the CSP 
esimate of the classification accuracy to the Bayes 
estimate. In the general case under study, the availabil- 
ity Of such reference is quite limited and in fact count 
estimators are the only alternative. Therefore, the 
availability of the MC estimation results makes the 
required comparison feasible. Table 4-24 lists 4 classi- 
fication accuracy estimates obtained via CSP (MC) techniques 
for the highest grid size (sample size) . Throughout this 
table, the values of the two estimates are quite close and 

in two cases (P t Case 5 and P i Case 4) the results 
c j co^ c | u >2 

are idential to one significant figure. The differential for 

A 

P c ranges from a low of 0.2% for Case 2 and 6 to a high of 
1.1% for Case 3 and 8. Averaged over all the cases, this 
difference amounts to 0.61. 

One of the most desirable properties of any estimator 
is consistency. The error variance is calculated using 
(2-37) and is plotted for all the 11 cases. Examination of 
these plots clearly shows that the variances of the CSP 
estimates are monotonically decreasing as the number of 
cells per axis increases. This is particularly 
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significant because as discussed in sec. 2.4.1 (2-37) 

does not conclusively indicate that lim Var{e„,} ■* 0 

n-*» 

although it strongly suggests that. This property is 
brought about by the fact that the total number of the 
points on the boundary, N , as a percentage of the points 
inside, monotonically decreases with increasing grid size. 
This observation is consistent with the assertion that 
the boundary cells are the only error causing elements 
in the CSP algorithm. Comparing the CSP and MC error 
variances for different cases, several properties are 
distinguished. The CSP error variance, for the medium 
recognition rates, is generally below that of the MC tech- 
nique. The rate at v? ich the MC variance falls, however, 
is faster and thus if their initial values are close a 
crossover takes place for large sample sizes. This 
difference in the rate of decay is evident from the 
expression for CSP error variance. Rewriting (2-37) 


n N B 

Var fe T J = jj <£) I 


(4-7) 


i=l 


The corresponding variance for a MC estimator is given by 


var la T J- (4-8) 

where e is the Bayes probability of error. Noting that 
N^ = n N , it is clear that both estimators fall off at a 
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rate of 1/N t * The CSP error variance, however, has another 

2 

sample dependent term, , which steadily increases with 

increasing N and thus cancels i/N. term to some extent, 
t t 

hence slower convergence. In the high classification 
accuracy bracket both estimators have small variances with 
that of MC slightly below CSP. This property is due to 
the fact that e (1-e) is the dominant factor for small N . 
Once the initial value of the MC error variance is smaller, 
its faster fall off would keep it below that of CSP. The 
differences involved, however, are small. Selecting a 
medium size grid, the absolute value of standard deviation 
differential ranges from a high of 1.16$ for Case 11 to a 
low of 0.13$ for Cases 3 and 5. 

4.2.2 Fixed Scatter, Variable Mean 
In order to observe the variation of the probability 
of misclassif ication with changes in the mean of a popula- 
tion, the simplex arrangement of Fig. 4-2 was maintained 
along with the fixed statistics of u>^ and m . Case 1 of 
sec. 4.2.1, the smallest separability, was selected as an 
initial starting point and the nonzero component of , 
m^, was incremented by O.lo step each time. A total 
of 7 cases ranging from J = .55 to J = 0.96 were cov^ed and 
are listed in Table 4-25. Similar to the variable scatter 
case, the classification error estimate is obtained using 
CSP and MC techniques. In order to avoid duplication 
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TABLE 


1-25 TEST CASES ARRANGED 
VARIABLE MEAN. 

BY INCREASING SEPERABILITY 

"l 

J 



0.8 

0.SS 

57 

41.8 

0.9 

0.60 

62 

44.7 

1.0 

0.66 

68 

47.5 

1.1 

0.73 

73 

51.4 

1.2 

0.£ J 

. — » 

/ 

53.2 

1.3 

0.SS 

- «> 

55.9 

1.4 

0.96 

100 

58.4 


TABLE 4-26 PERCENT CLASSIFICATION ACCURACIES OBTAINED BY CSP 


ESTIMATION 

TECHNIQUE. 

CASE 12 




A 

A 

A 

A 

G 

P | 

P i 

P 1 

P 

s 

°l“l 

c|o) 2 

cjw 3 

c 

4 

57.6 

69.6 

71.2 

66.2 

S 

66.8 

72.3 

75.5 

71.5 

6 

67,0 

71.0 

74.5 

70.8 

7 

71.0 

71.4 

76.2 

72.9 

8 

72.3 

69.6 

76.5 

72.8 

9 

72.9 

70.4 

77.0 

73.4 

10 

73.2 

72.6 

77.4 

74.4 

11 

72.7 

73.2 

77.4 

74.4 

12 

72.1 

74.2 

76.8 

74.4 

13 

72.7 

74.3 

77.6 

74.9 

14 

72. 1 

74.4 

76.9 

74.5 
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TABLE 4-27 PERCENT CLASSIFICATION ACCURACIES OBTAINED BY CSP 
ESTIMATION TECHNIQUE. CASE 13 


G 

s 

P s 

P 1 
c [ 

P | 
ciw 3 

P 

c 

4 

60.1 

69.6 

73.7 

67.8 

5 

74.1 

72.3 

76.9 

74.5 

6 

74.3 

71.0 

75.6 

73.7 

7 

76.9 

71.6 

75.6 

74.7 

8 

76. 1 

69.8 

76.8 

74.3 

9 

75.9 

71.3 

77.0 

74.7 

10 

75.4 

73.0 

78.1 

75.5 

11 

75.0 

73.5 

77.3 

75.3 

12 

75.3 

74.4 

78.2 

76.0 

13 

75. 1 

74.4 

78.0 

75.8 

14 

75.0 

74.5 

77.4 

75.6 
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FIG. 4-3S CSP CLASSIFICATION ACCURACY ESTIMATE VS. GRID SIZE 
VARIABLE MEAN. CASE 13 


16 fi 


TABLE 4-28 PERCENT CLASSIFICATION ACCURACIES OBTAINED BY CSP 
ESTIMATION TECHNIQUE. CASE 14 


G 

s 

P | 

c I 

P | 
c !“2 

p i 

c h 

A 

p 

c 

—2T 

69.0 

69.6 

73.7 

70.8 

5 

79.5 

72.3 

77.2 

76.4 

6 

76.4 

71.0 

75.6 

74.4 

7 

79.0 

71.6 

75.8 

75.5 

8 

77.3 

70.2 

77.6 

75.0 

9 

77.5 

71,6 

77.8 

75.6 

10 

77.4 

73.2 

78.4 

76.4 

11 

77.8 

73.5 

78.8 

76.7 

12 

77.9 

74.4 

78.8 

77.1 

13 

79.1 

74.4 

78.9 

77.5 

14 

79.6 

74 * 6 

78.4 

77.5 
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TABLE 4-29 PERCENT CLASSIFICATION ACCURACIES OBTAINED BY CSP 
ESTIMATION TECHNIQUE. CASE 15 


G 

s 

A 

P 1 
cl^ 

4~ 

74.7 

5 

80.9 

6 

77.5 

7 

80.0 

8 

78.5 

9 

80.3 

10 

80.8 

11 

81.7 

12 

82.8 

13 

82.6 

14 

82.3 


A 


Pi 

P | 

P 

c|a > 2 

Cja > 3 

C 

69.6 

75.1 

73.1 

72.3 

77.2 

76.8 

71.2 

76.4 

75.0 

72.1 

75.9 

76.0 

71.0 

78.3 

75.9 

71.8 

78.5 

76.9 

73.3 

79.4 

77.8 

73.6 

79.6 

78.3 

74.5 

79.7 

79.0 

74.5 

79v3 

78.8 

74.7 

79.2 

78.7 
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TABLE 4-30 PERCENT CLASSIFICATION ACCURACIES OBTAINED BY CSP 
ESTIMATION TECHNIQUE. CASE 16 


G s 

P , 
cl^ 

P | 
c * u 2 

P | 
°l“3 

A 

P 

c 

"4~ 

75.5” 

69.6 

77.8 

74.3 

5 

81.4 

72.3 

77.5 

77.0 

6 

79.2 

71.2 

76.4 

75.6 

7 

81.8 

72.3 

77.4 

77.2 

8 

82.5 

71.8 

78.5 

77.6 

9 

85.2 

71.8 

79.0 

78.7 

10 

85.4 

73.3 

80.3 

79.7 

11 

85.2 

73.6 

80.3 

79.7 

12 

84.5 

74.5 

79.9 

79.7 

13 

84.2 

74.7 

80.0 

79.6 

14 

84.0 

75.0 

79.5 

79.5 
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TABLE 4-31 PERCENT CLASSIFICATION ACCURACIES OBTAINED BY CSP 
ESTIMATION TECHNIQUE. CASE 17 


G 

s 

A 

P | 
ci^ 

P | 
c|(o 2 

P t 

c i“ 3 

A 

P c 

4 

75.8 

69.6 

77.8 

74.4 

S 

81.8 

72.3 

77.9 

77.3 

6 

80.8 

71.4 

76.8 

76.3 

7 

84-9 

74.0 

77.8 

78.9 

8 

87.2 

71.8 

79.4 

79.4 

9 

87.7 

71.8 

80.2 

79.9 

10 

86.5 

73.3 

80.8 

80.2 

11 

86.0 

73.6 

80.9 

80.2 

12 

86.0 

74.6 

80.6 

80.4 

13 

86.1 

74.9 

80.3 

80.4 

14 

86. 1 

75.3 

80.0 

80.5 
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~I G . 4-45 CSP CLASSIFICATION ACCURACY ESTIMATE VS. GRID SIZE 

VARIABLE MEAN. CRSE 17 
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TABLE 4-32 PERCENT CLASSIFICATION ACCURACIES OBTAINED BY CSP 
ESTIMATION TECHNIQUE. CASE 18 


G 

P , 

P 1 

P | 

P 

s 

c !“l 

c|w 2 

c| W 

C 

~4 

“7578 


77.8 

74.4 

5 

83.2 

72.4 

78.5 

78*0 

S 

83.6 

71.7 

77.7 

77.7 

7 

89.6 

74.0 

78.6 

80.7 

8 

88.6 

71.8 

79.4 

79.9 

9 

88.2 

71.9 

80.2 

80.1 

10 

87.0 

73.3 

81.9 

80.8 

11 

87.8 

73.7 

81.3 

81.0 

12 

88.0 

74.9 

81.0 

81.3 

13 

89.8 

75.3 

80.8 

82.0 

14 

89.9 

75.9 

81.0 

82.3 
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FIG. 4-44 CSP CLASSIFICATION ACCURACY ESTIMATE VS. GRIO SIZE. 
VARIABLE MEAN. CASE 18 
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of the results, however, the Monte Carlo error estimate is 

reported only for one sample size of about 6000. A large 

sample was chosen to assure a small bias and variance. 

Table 4-33 compares the CSP error estimates for the largest 

grid size with the corresponding MC estimation results. 

Cases 12 through 18 refer to the 7 cases listed in Table 

4-23 with increasing variability. In examination of the 

results related to the variable mean case, all the CSP 

estimate properties are observed again; particularly evident 

is the generally rapid rise to a steady state value followed 

by a small magnitude oscillation. The curve corresponding 

to the Class 1 classification accuracy generally moves as 

expected. The separability increase by translation of 

along x^ also improves § c | ^ and P c j ^ but the improvement 

is not as great. £ , increased 18% from Case 12 to Case 

c\m 1 

18 while in the same range P < and P . improved 1.5% 

3 c l u 2 c ' t0 3 

and 4.1%, respectively. The comparison of CSP and MC 

estimation results reveals that the differential between 

them is again small. For P , the difference ranges from a 

c 

high of 0.9% for Case 6 to a low of 0% for Case 4, Table 
4-33. 


4.2.3 Classification Error Estimation When 
the Bayes Rate is Known 

Throughout this validation process the missing element 
has always been a fixed reference point in the form of a 
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TABLE 4-33 COMPAHISON OF CSP AND MC PERCENT CLASSIFICATION ACCURACY. 
VARIABLE MEAN. 


Case 

P 

c 

CSP 

l“l 

MC 

P | 

c | 

CSP 

“2 

MC 

1 

72.1 

12. S 

74.4 

74.4 

2 

75.0 

76.2 

74. S 

74.6 

3 

79.6 

78.8 

74.6 

76.4 

A 

82.3 

81.3 

74.7 

74.8 

5 

84.0 

84.9 

75.0 

75.7 

6 

86.1 

87.1 

75.3 

74.9 

7 

89.9 

89.0 

75.9 

76.0 


c | 

“a 


c 

CSP 

MC 

CSP 

MC 

76.9 

76.9 

74.5 

74.6 

77.4 

78.3 

75.6 

76.3 

78.4 

78.7 

77.5 

78.0 

79.2 

80.1 

78.7 

78.7 

79.5 

80.1 

79.5 

80.7 

78.0 

80. 1 

79.8 

80.7 

81.0 

80.9 

82.3 

82.0 
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Known Bayes error. When only two classes are present in the 
data set, however, the desired quantity has been computed 
for up to an eight dimensional feature space [33] • The 
availability of this result provides two significant 
properties: (a) the reference error is sample- independent; 

(b) by working in a two dimensional subspace large grid 
sizes, impractical in higher dimensions, can be employed to 
observe the limiting behavior of the CSP algorithm and more 
importantly considerable insight to the geometry of the grid 
dynamics can be gained by actually displaying the domains 
of integration. 

The variation of the probability of correct classifi- 
cation vs. grid size using the CSP estimation technique for 
a two dimensional feature space is shown in Fig. 4-45. 

The reported Bayes classification accuracies are super- 
imposed on the plot and serve as asymptotes ; 


P = | U2 = 90.7% (4-9) 

P - 94.0% 
c 

The grid size ranges from a coarse 5 cells per axis to a 
very fine 75 cells per axis. 

The behavior of the classification accuracy is some- 
what different than the previous test cases. Two notice- 
able features are slower convergence and oscillations around 
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FIG. 4-45 COMPARISON OF BAYES AND CSP CLASSIFICATION ACCURACY 
ESTIMATES. TWO FEATURES 
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the assymptotes. The slower convergence can be traced to 

the property that the error estimate's variance is inversely 

proportional to the per axis cell number and exponentially 

to the dimensionality of the feature space. For example, on 

the basis of the total number of cells within the grid, 12 

cells per axis in 2 dimensions where P ■ has its highest 

C! w ± 

surge, corresponds to somewhere between 3 and 4 cells per 
axis in 4 dimensional space and the corresponding numbers 
are only 8 and 9 for 75 two dimensional cells per axis. 
Another property not observed in the variable scatter or 
variable mean cases it the dissimilarity between the func- 
tional form of P . and P , . In the 3 dimensional feature 

C | C j (O2 

space examined before, all the estimates showed similar var- 
iational form with the grid size. In this case £ . ex- 

c l“2 

hibits periodic overshoots at 11-12, 15-16, 26-27, 43-44, 

etc. cells per axis. These oscillations are of the same 

nature as described in sec . 4.2.1. In this case , however , 

it is possible to get a close up of the actual estimation 

process. Consider the 15-16 jump. The two dimensional areas 

of integration are shown in Fig. 4-46. Take one scan line 

going through the domain (dotted region) . This line is 

marked with cell centers for three different grid sizes 

G = 15, 16, 17, Fig. 4-47. Denote two of these boundary cells 
s 

by x^ and x£ and let us follow their movement as grid size 

increases. For G = 15, x, = .54, and x' = 2.7 are located 

s b b 

inside and outside of the domain. Recall that this 
domainis multiply connected. Therefore, the estimated 
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locations of the boundaries are in the intervals 
0 < x < x. = .54 

U 

2.14 < x < x£ « 2.7 


(4-10) 


For G g - 16, x^ and x£ have moved to .52 and 2.58 respec- 
tively, and the boundary location is now narrowed to the 
domain 


0 f x £ = .52 

2.14 < x ^ x b = 2.58 


(4 -11) 


x^ and x£ are still one outside and one inside respectively... 

The next grid size G — 17 is where P . takes on a rapid 
3 s c | u> 2 v 

jump. Now the boundary is determined to lie in the 
interval 


.5 i x 1 1 
2.5 5 x < 3.0 


(4-12) 


Comparing (4-12) with (4-11) establishes that the boundary 
must lie in the narrow interval 


.5 < x < .52 
2.5 < x < 2.6 


(4-13) 


In this step, however, x^ = .5 has moved from outside to 
inside and x£ = 2.5 made a similar move to outside of the 
integration domain. Recalling the discussion on the 
estimator's bias in sec. 2.4.1, shows that in this 
transition a net positive gain has occurred in the 
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Fig. 4-47 One Dimensional Grid Dynamics Illustration 
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estimation of the volume under f (X j t hence the surge 
in P | 

c ' “2 

The comparison of the estimation results in 4 dimension 
with the one just obtained can shed some light on the effect 
of data dimensionality on the performance of the CSP algo- 
rithm. Using 4 features , the reported Bayes classification 
accuracies are [33] 


= 97.4% 


Pi = 95.0% 


P 1 
c j 

? c I 0 ) 2 


(4-14) 


P = 96.2% 
c 

The results are shown in Pig. 4-48. Note that the func- 
tional form of this estimate is much more like the cases 
studied in a 3 dimensional feature space and the oscillation 
property is considerably less pronounced than its two 
dimensional counterpart due to a higher feature space 
dimensionality. The final values are all within 0.1% of 
the reported classification accuracy. In fact, consider- 
ing that (4-14) is shown up to only 1 significant digit, 
the differential can be attributed to the round off factor 
and the estimates and their asymptote may well be identical. 


4. 3 Classification Accuracy Estimates Using 
Landsat and Aircraft MSS Data 
The performance of the CSP estimation technique has 
been extensively investigated using simulated data. That 
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study was intended to verify the proper operation of the 
algorithm. The actual application of ‘this technique, how- 
ever is the estimation of the probability of error in the 
classification of various cover types present in multi- 
spectral remotely sensed data. The currently operational 
lands at -2 gathers information in 4 spectral bands. There- 
fore, the feature space is a moderate 4 dimensional domain 
where the CSP algorithm can effectively operate. In cer- 
tain conditions some of the bands may be deemed redundant 
and thus a subset of the available 4 may be used. Three 
test regions were selected providing different numbers 
of cover types and classification error. rates: Ogle county 

Illinois; Grant Couty Kansas; and Graham County Kansas. 

The results of the parametric error estimator are 
compared with those of LARSYS, a data anlysis and classifi- 
cation technique developed at the Purdue University Labora- 
tory for Applications of Remote Sensing. According to this 
algorithm a set of training fields is selected for each 
cover type based on ground truth information. These fields 
are then used to provide the necessary statistical input 
to an optimal Bayes classification such as (2-7) . The 
entire frame of data is then classified by testing each pixel 
using (2-7) . In order to obtain an estimate of the classi- 
fication accuracy, a set of test fields is chosen and follow- 
ing the completion of the classification process, a count 
estimate such as (2-1) is computed for the misclassif ied 
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Table 4-34 Percent Classification Accuracies 

Obtained by CSP and LARSYS Algorithms 
Ogle County, IL 


Class No. of Samples 


411 


Soybean 
0 there 
Overall 



LARSYS % 


.3 

90.6 
94.0 

90.7 


CSP% 


91. J 
90.6 
91.2 


Table 4-35 Percent Classification Accuracies 

Obtained by CSP and LARSYS Algorithms 
Graham County, KAN 


LARSYS % 


Class No. of Samples 





AG1 


AG 2 
AG 3 

Nonfarm 

Wheat 

Overall 


Table 4-37 

Class 

Saresoil 

Corn 

Pasture 

Wheat 

Overall 


762 

930 

3065 


94.9 

82.7 

79.2 


90.5 

79.7 

78.3 


Comparison of Percent Classification 
Accuracies Obtained by CSP and LARSYS 
Algorithms for Graham County Simulated data 


LARSYS % 

CSP% 

Difference 

77.8 

78.3 

0.5 

91.2 

91.0 

0.2 

95.3 

95.1 

0.2 

94.2 

93.9 

0.3 

89.6 

89.6 

0.0 
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samples. Frequently the training fields themselves are 
used in the performance calculations. 

4.3.1 Ogle County, Illinois 

This data is a portion of Landsat scene 1017-16093 
acquired August 9, 1972, and has a LARS runtable entry of 
72037806. Three training classes were used and classifi- 
cation was performed using 4 spectral lands; vis., channels 
1 through 4. Table 4-34 shows both the classification 
accuracies obtained using the LARSYS point classifier and 
the CSP error estimation technique. 

4.3.2 Graham County, Kansas 

This data set is LACIE SRS segment 1018 and has a LARS 
runtable entry of 74078500. Channels 9 through 12 which 
are the acquisition corresponding to Landsat scene 1672- 
1644, were used. Four training classes were developed 
from 229 training fields. Results are tabulated in Table 
4-35. 


4.3.3 Grant County , Kansas 
This data set is LACIE SRS segment 1036 and has a LARS 
runtable entry of 74027600. Channels 5 through 8 which are 
the acquisition corresponding to Landsat scene 1655-16512, 
were used in the classification study. Five training 


ORIGINAL PAGE Ife 
OP POOR QUALTHl 


CLASS..,. AG 1 
HISTOGRAM (S» 


TOTAL NUMBER OF SAMPLES... 793 


CHANNEL 1 O.S0 - 0.60 MICROMETERS 

EACH • REPRESENTS 10 POINT<S). 


140 

I 

4 

130 

I 

• S 

120 

I 

m o 

110 

I 

$ o 

160 

I 

3 © o 

90 

I 

*•8 *2 

80 

I 

«»«3 09 

70 

1 

*000509 

60 

1 


50 

1 

390000*0 

40 

I 

000*0**0 

30 

I 

O900000002 

20 

I 

000000*009^ 

J8 

I 1 

3000**000*0031 

0.5000 

40.50 


II 


I 1 1 


80.50 




1 s 

120.5 


Fig. 4-49 Histogram of Actual Landsat Data, Grant Co, 


160.5 


KAN 


NO OF CELLS PER AXIS 


4-50 CSP CLASSIFICATION ACCURACY ESTIMATE FOR GRAHAM CO 




193 


classes were developed from 388 training fields. Results 
are tabulated in Table 4-36. 

4.3.4 Discussion of the Results 
Examining the results obtained here reveals that the 
performance of the CSP error estimator is consistent with 
t '".at of sec. 4.2 using simulated data and closely matches 
MC classifier's output. As close as the CSP and LARSYS 
results may look, the differential in some cases is greater 
than the ones observed using artificial data. For example, 
in Graham County, bare soil is classified with 65.9% 
accuracy while the indicated theoretical value is 78.3, 
pasture is classified with 94.8% accuracy vs. the expected 
result of 95.1%. Similarly, in Grant County, AG1 is clas- 
sified with 52.5% accuracy vs. 59.3%. A possible explana- 
tion, initiated by examining the histogram of the actual 
data (Fig. 4-49) , may lie in the validity of the assumption 
about the normality of the data under study or much more 
likely, the normality of the statistics of the training 
areas. In order to remove this element of uncertainty, 
artificial data was generated using the Graham County 
statistics. The simulated data was then reclassified using 
LARSYS. No attempt was made to keep the field sises in 
the simultaed data equal to those in the original set since 
the purpose of this step was to generate data having 
statistics as close to normal as possible. The new 
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classification accuracies are tabulated in Table 4-37. 

The results are illuminating . While the bare soil was 
classified with a 65.9% accuracy in the actual data, in the 
simulated data this rate has risen to 77.8%, a gain of 
almost 12%, to put it within 0.5% of the result predicted 
by the CSP algorithm, similar observations can be made about 
other classes. The simulated data set being one realization 
of a stochastic process, makes the LARSYS results a random 
quantity. In order to make sure that Table 4-37 is not 
just one special case, the simulated data was re-generated 
three times using a different starting point for the pseudo- 
random number generator. The results shown in Table 4-38 
confirm the preceding observations since the same close 
match exhibited before is repeated. 

The results of the CSP error estimator are grid size 
dependent. Since variations in performance as a function 
of grid size were studied before, the classification 
accuracies reported here for the actual data are for a 
single G , usually around 12. For illustration purpose, 
the Graham County data was analyzed using a step wise 
grid employed before and the results are shown in Fig. 4-50. 
Note that the estimator exhibits the same properties ob- 
served repeatedly in the earlier studies. 

4.4 Concluding Remarks 

The performance of a multiclass multidimensional 
parametric Bayes classifier has been tested under widely 
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different conditions. In all cases, the results matched 
whatever reference point available. When that reference 
was the equivalent MC estimators , the CSP estimate was 
within 1% of it. Considering that an MC estimate is a 
random quantity, repeating the error estimation with another 
sample function of the process produces a different realiza- 
tion of the classification accuracy estimate. Therefore, 
an averaged MC estimator is well within the 1% maximum 
deviation from the CSP results. In Table 4-38 where three 
realizations of the MC estimator are computed, the overall 
classification accuracy estimates are 89.2%, 90.5%, and 
90.0%. This compares with the fixed CSP estimate of 89.6%. 
Whereas the individual differences are 0.4%, 0.5%, and 0.4%, 
the averaged MC estimate, 89.9%, is 0.3% off. In fact, 
this difference may be reduced even further, if the number 
of MC estimates that are averaged is increased. 

When the exact Bayes error rates were available, sec. 
4.2.3, the CSP estimator provided essentially identical 
results. It has been shown that this algorithm has uniform 
performance with consistent systematic features throughout 
the test cases. One possible limitation emerged in that 
when hyperplanes parallel to the coordinate axes forming 
the boundaries of the feature space the CSP estimation 
technique performs poorly due to periodic high amplitude 
overshoots when there is a total shift of a considerable 
number of boundary cells from the outside to the inside 
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of integration domain and vice versa. However, this is 
expected to be an unlikely occurrence with real data. 

In conclusion, it should be kept in mind that the class- 
ification error estimation algorithm developed here was not 
intended to provide a higher quality estimate than various 
random sampling techniques. The fact that it does so in 
many cases is only incidental but justifiable. The orig- 
inal goal was the development of an estimation algorithm 
dependent on the parameters of the problem alone. To 
that end the CSP estimation procedure has met the 
objectives. 
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CHAPTER 5 

Experimental Evaluation of the MSS Spatial Model 

This chapter is aimed at the validation and analysis 
of the scanner spatial model developed in chapter 3. 
Successful accomplishment of this task enables the inte- 
gration of this model and the parametric Bayes error 
estimator as a complete set of tools for the evaluation 
of the performance of a MSS for any set of specified 
parameters. From Fig. 4-1 where the entire simulation pro- 
cess is depicted, it is seen that there are three phases 
involved. (i) Validation of the scanner characteristic 
function by comparing the output spectral covariance matrix 
of a convolution operator with that of the scanner linear 
system model. The input to the former is a simulated or 
real data set and to the latter is the statistical and 
spatial parameters of such a set. The results should 
closely match. (ii) Introduction of additive random 
Gaussian noise at the scanner input and output. (iii) Com- 
parison of the probability of correct classification at the 
input to that of various output stages with noise power, 
scanner IFOV and data spatial structure as variables. 

Before embarking on the experiment, it is necessary to 
develop a suitable simulated data set. 
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5> 1 Description of the Data Base 
Stage XI of the test data base simulation starts here. 
The checkout of the CSP error estimator required specifica- 
tion of the spectral characteristics of the data alone due 
to the fact that the spatial information is transparent 
to the Bayes spectral classification algorithm. The 
validation of the scanner model requires further condition- 
ing of the stage I simulation output. 

The ’white noise* property of the available test data 
although insignificant previously, would no longer be a 
realistic assumption about the multispectral data. In 
particular, the scanner's response is quite sensitive to 
the spatial structure of the input process (Fig. 35 through 
3-12). It has been shown in sec. 3.3.2 that a Markov model 
closely approximates the spatial correlation of the multi- 
spectral data. Therefore, stage II of the simulation pro- 
cess consists of an additional transformation on the 
existing data base as a means of creating an exponential 
correlation property with any desired parameters. The 
technique to accomplish this task is formulated in the 
discrete domain in Appendix B. It is shown that filtering 
of a white noise process where the filter's PSF is a two 
dimensional one sided exponential, 

-x/r -y/r 

h(x,y) = c 2 e x e ^ x,y > 0 (5-1) 
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generates a two dimensional random field with the adjacent 

sample and line correlation given by 

-1/r 
' x 

(5-2) 


p = e 
x 


-1/r 


p = e 

y 


respectively- In addition to the correlation generating 
property, (5-1) inevitably alters the spectral structure 
of the input process. Prom (B-TJ) the output variance 
associated with any spectral band is given by 



(5-3) 


where N q is the filter's PSF length in pixels, a ^ is the 

variance of the input process, a the corresponding output 

9 

quantity and W(0,0) a quantity depending on r x »r and N 

y ® 

(5-3) approaches its continuous version for large N which 
is 


a 


2 

g 



1/4 r r 
' x y 


(5-4) 


In chapter 3 it was pointed out that the magnitude of the 
variance reduction is large when a white noise process is 
transmitted through a MSS. Since the exponential filter 
is basically a linear system, the same property is observed 
in (5-4). For example, in order to generate a data set 
with the following somewhat typical correlation structure? 
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p =0.85 
x 

P y - °*75 

it requires that 

r =6.15 
x 

r =3.47 

y 

From (5-4) 

2 2 

a /c if = .012 
g f 


(5-5) 


(5-6) 


(5-7) 


and thus, the output variance is slightly over 1% of the 
input variance. This small fraction causes practical prob- 
lems in generating the desired data set due to the finite 
dynamic range of the digital data on the storage medium. 

The representation of the problem in the discrete domain, 
however, provides the length of the filter as another 
variable to control the ratio expressed in (5-4) . Let 
1^=5 and and r^ be as specified in (5-6). Then from 
(5-3) 


a g 2/ a f 2 = 0.048 (5-8) 

The resulting inter sample and line correlations are now 
0.65 and 0.53, respectively. The exponentially correlated 
data base is generated with adequate N/r^ and N/r ratio 
to closely approximate the continuously derived results. 
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5.2 Evaluation of the Scanner Characteristic Function 

The scanner characteristic function, the transfer func- 
tion that establishes a parametric/analytical relationship 
between the input and output of a multiband MSS, is the pri- 
mary means by which various interactive processes within the 
scanner are studied. Like every other model developed so 
far, it is desirable to establish that near identical results 
are obtained using empirical techniques. In Fig. 4-1, this 
validation process is laid out. A white noise process 
with some prescribed statistics is generated and then con- 
ditioned to exhibit a specified pixel-to -pixel correlation. 
The actual data is transformed by a convolution operator 
having the PSF of the desired scanner and then the output 
statistics estimated. The statistics of the same input 
process are operated on by the scanner characteristic func- 
tion and the output statistics directly computed. The 
comparision of the two resulting covariance and correlation 
matrices will produce the required result. 

For this test the particular choice of the input 
statistics is relatively unimportant. Therefore, in order 
to use the data already available. Class 1 in Case 1 
listed in Table 4-1 is selected as test data? 


5f 


l 


1.0 0.75 0.15 

1.0 0.45 


(5-9) 


1.0 
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The channel standard deviations are set at a large 

a i = 30 i = 1, 2, 3 (5-10) 

to cope with two successive variance reducing linear trans- 
formations. The variables of the problem are the scene 
correlation and the scanner IFOV. The XPOV is defined 
as the angle at the scanner subtended by a resolution 
element on the ground; e.g., 87 prad for the Landsat 
MSS. This definition when based on a Gaussian PSF is not 
unique. One convention that has been used [3] defines 
the IFOV as the angle between points where the PSF has 
dropped to half its peak amplitude , Fig. 5-1. Throughout 
this chapter the definition adopted is such that IFOV 
and characteristic length, r Q , are identical. 

Scanner systems with different resolution capabilities 
and different signal sampling intervals produce images 
with different adjacent pixel separation. In order to 
eliminate the dependency of the problem formation on the 
actual physical distance between each resolution element, 
the spatial parameters are normalized to that quantity 
and thus many of the results are on a per pixel basis. 

Later in the experiment alternate conventions are defined 
based on the particular problem under study. According 
to this definition, pixel separation in effect is unity. 
This assumption is particularly relevant in the simulation 


Pig. 




5-1 Conceptual Illustration of a Picture 
Element Viewed from the Satellite. 
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stages of the experiment since data is artificially 
generated and one can assign any desired quantity to the 
samples and lines separation interval. 

In order to experimentally verify the theoretical 
variations of the scanner characteristic function a test 
data set is required. The adjacent sample correlation is 
scanned from 0.6 to 0.9 with an increment of 0.05 while 
the adjacent line correlation is kept at 0.7 for the first 
set and 0.8 for the second. For test purposes two sets of 
scanner PSF's with r Q = 1 and 4 pixels are selected. The 
particular choice of these parameters are again somewhat 
arbitrary. An attempt was made, however, to make the 
selections realistic in terms of practical systems. 

For each adjacent sample correlation, adjacent line corre- 
lation and the scanner IFOV, the ratio of the output 
variance to the input variance is experimentally deter- 
mined and the results superimposed on the theoretical 
plot of the characteristic function vs. scene correlation. 
This is done for one spectral band and the results are 
shown in Fig. 5-2 and 5-3. The percent difference between 
the theoretical and experimental characteristic functions, 
and is expressed in Tables 5-1 through 5-4, where 
is the functional value for the ith sample-to-sample 
correlation in the corresponding table . 

Examination of the error terms and the accompanying 
figures indicates a relatively close match between the two 
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TABLE 5- i EXPERIMENTAL AND THEORETICAL SCANNER CHARACTERISTIC 
FUNCTION. IFOV= 1, ADJACENT LINE CORRELATIONS. 70 


P 

W . 

W . 

e . % 

X 

l 

l 

l 

0.60 

0.56 

0.54 

3.70 

0.65 

0.58 

0.56 

3.60 

0.70 

0.60 

0.59 

1.70 

0.75 

0.62 

0.62 

0.0 

0.80 

0.63 

0.65 

3.10 

0.85 

0.65 

0.68 

4.40 

0.90 

0.65 

0.70 

7.10 


TABLE 5- 2 EXPERIMENTAL AND THEORETICAL SCANNER CHARACTERISTIC 
FUNCTION. IFOV= 4, ADJACENT LINE CORRELATIONS. 70 


P 

w. 

W . 

e .% 

X 

l 

1 

i 

0.60 

0. 12 

0.14 

14.30 

0.65 

0.13 

0.16 

18.70 

0.70 

0 . 15 

0. 18 

16.60 

0.75 

0.18 

0.20 

10.00 

0.80 

0.20 

0.23 

13.00 

0.85 

0.23 

0.27 

14.80 

0.90 

0.28 

0.31 

9.70 
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TABLE 5- 3 EXPERIMENTAL AND THEORETICAL SCANNER CHARACTERISTIC 
FUNCTION. IFOV= 1, ADJACENT LINE CORRELATIONS .80 


P K 

W. 

W. 

e. % 

1 

l 


0.60 

0.61 

0.S6 

8.90 

0.65 

0.62 

0.62 

0.0 

0.70 

0.65 

0.65 

0.0 

0.75 

0.66 

0.68 

2.90 

0.80 

0.68 

0.71 

4.20 

0.85 

0.68 

0.74 

8. 10 

0.90 

0.70 

0.78 

10.20 


TABLE S- 4 EXPERIMENTAL AND THEORETICAL SCANNER CHARACTERISTIC 
FUNCTION. IFOV= 4, ADJACENT LINE CORRELATIONS. 80 



W. 

W. 

e.% 

X 

l 

X 

l 

0.60 

0.15 

0.18 

16.60 

0.65 

0.17 

0.21 

19.00 

0.70 

0.20 

0.24 

16.60 

0.75 

0.23 

0.27 

14.80 

0.80 

0.26 

0.31 

16. 10 

0.85 

0.31 

0.35 

11.40 

0.90 

0.37 

0.41 

9.70 
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independently derived functions. When p =0.7 and r = 1 

y o 

pixel, the percent error ranged from a high of 7.1% 
at p =0.9 to 0% at p =0.75 for an average of 3.4%. For 
r Q = 4 pixels the percent error ranged from a high of 18.7% 
at p =0.65 to 9.7? at p =0.9 for an average of 13.9%. 

The explanation for a higher discrepancy between the 
theoretical and experimental values of the latter case 
can be attributed to the inherent error of a discrete 
testing of an essentially continuous phenomenon. Thile 
this error is always present, under certain unfavorable 
conditions may become significant. In this case a large 
IFOV dictates the choice of PSF with a considerably 
greater nunber of samples in order to satisfactorily 
approximate its continuous counterpart. This in turn 
requires a larger size data base and accompanying increase 
in computation time. The last factor was the main con- 
straint that limited the PSF's length and contributed 
to the increase in deviation from the theoretical result. 
This factor notwithstanding, Fig. 5-2 and 5-3 show a very 
acceptable harmony between the two results and provide 
substantial evidence for the validity of the analytic 
scanner characteristic function. 

This validation was accomplished in the context of 
variances alone. That this is not a special case is 
easily concluded from the property that the output cross- 
channel spectral correlation coefficients are simply ratios 
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of one or several appropriate characteristic functions. 

Having tested its building block, the experimental verifi- 
cation of the entire spectral correlation matrix is 
implicitly accomplished . 

By means of the above evaluation the parametric models 
developed for the analysis of the MSS performance is 
accomplished. Therefore, unless stated otherwise, all the 
results obtained hereafter will be based entirely on the 
statistical properties of the populations , scanner parameters 
etc. and no data bases, simulated or measured, will be 
employed . 

5.3 MSS and Classif iability of the Multispectral Data 

A major application of the various parametric models 
and methods developed during this study is in determining 
the interactions among the MSS system parameters on a 
data- independent basis. Having experimentally verified 
the validity of the models , such evaluation of the 
performance of a multispectral scanner is feasible. In 
any system analysis the definition of an index of performance 
is a basic requirement. When the system is a MSS in a 
remote sensing data gathering package, the accuracy by 
which various populations present in the final data set 
are classified primarily determines the degree of success 
of the initial design. Therefore, throughout this chapter 
the objective is to observe the probability of current 
classification at various stages of the MSS, Fig. 5-4, 
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and monitor its variations with the SNR, scanner IFOV 
and spatial correlation of the scene. A Gaussian PSF 
is employed unless stated otherwise. 

5.3.1 Classification Accuracies at the MSS Output: No Noise 

The test statistics is Case 1 in Table 4-1 
containing 3 classes with 3 features. The input to the 
scanner is a spatially correlated data set with an adjacent 
sample correlation ranging from 0.5 to 0.95 in steps of 
0.05. For each a corresponding is computed on the 
following basis. The sampling of the analog Landsat data 
is such that the ratio of the ground distance between the 
cross-track pixels to that of along-track is about 0.7. 

Since the adjacent pixel correlations along these two 
directions are equal in a continuous model , it follows 
that if 

J - .-at 

t , n = 0,1,2, ... 


p = e 
x 


n _ -hr, 
p G 

y 


(5-11) 

(5-12) 


then 


therefore 


a = 0.7b 


t _ n (10/7) 

P — P 

y x 


(5-13) 


(5-14) 


With the input statistics defined as above, 10 cases are 

obtained and for each case r is varied from 1 to 8 pixels 

o 


i 
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and an output classification accuracy is estimated for 
each combination of the scene correlation and scanner IFOV. 
Fig. 5-5 through 5-14 and Tables 5-5 through 5-14 show 
the variation of the output probabilities of correct 
classification as a function of IFOV. 13 cells per axis 
are used in the CSP error estimation algorithm. 

The variations of the output probabilities of correct 
classification are in complete agreement with those projected 
by the characteristic function. The most notable feature 
is the inverse relationship between the scene spatial cor- 
relation and the slope of P c | w vs. IFOV at the output. 

When the scene is spatially highly uncorrelated such as 

Fig. 5-5, P gained 16.2% by increasing the IFOV from 1 
c 

to 2 pixels wide, whereas, the same increase in IFOV 
produced a gain of 9.7% for p = 0.6, 6.7% for p =0.7, 

X X 

3.3% for p =0.8 and only 0.9% when p =0.95. This 

X X 

behavior can be predicted from the variations of W vs 

S 

p . Referring to Fig. 3-5 through 3-12 where W is plotted, 

it is observed that the one step reduction in input 

variance gets progressively smaller toward higher scene 

correlations. For the test case under study where any 

reduction of the class variances along a feature axis can 

contribute to increased separability, the aforementioned 

property of W accounts for the changing slope of P . 

s c 1 U>i 

over the ensemble of the scene spatial correlations. 


TABLE 5- 5 SCANNER OUTPUT CLASSIFICATION ACCURACIES VS 
ADJACENT SAMPLE CORRELATION©. 50 S 


IFOV 


IFOV 

p , 

°K 

<N 

3 

m 

3 

P 

c 

l 

68.9 

74.2 

76.6 

73.2 

2 

82.4 

86.0 

84.9 

84.4 

3 

91.5 

94.3 

92.7 

92.8 

4 

96.5 

97.8 

97.0 

97. I 

5 

98.7 

99.2 

98.9 

99.0 

6 

99.6 

99.8 

99.7 

99.7 

7 

99.9 

99.9 

99.9 

99.9 

8 

99.9 

99.9 

99.9 

99.9 
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TABLE 5- 6 SCANNER OUTPUT CLASSIFICATION ACCURACIES VS. 
ADJACENT SAMPLE CORBELATION=0.5S 


IFOV 


IFOV 

“I 

2 

3 

4 

5 

6 
7 


t ' 

c “l 

•t' | 

c|o)2 

c|co 3 

P 

c 

66.5 

70.6 

75.6 


78.3 

83.7 

82.6 

81.6 

87.7 

91.6 

89.8 

89.7 

94.2 

95.9 

94.8 

95.0 

97.3 

98.4 

97.7 

97.8 

98.9 

99.4 

99. 1 

99. 1 

99.6 

99.8 

99.7 

99.7 

99.8 

99.9 

99.9 

99.9 
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TABLE 5- 7 SCANNER OUTPUT CLASSIFICATION ACCURACIES VS. IFOV 
ADJACENT SAMPLE CORRELATIONS. 60 


IFOV 

P 1 
c K 

A 

v i 

°l“2 

P I 

C K 

A 

P 

c 

1 

64.4 

68.5 

74.7 

69.2 

2 

75.3 

81.2 

80.2 

78.9 

3 

84.2 

87.7 

86.8 

86.2 

4 

91.2 

93.7 

92.3 

92.4 

5 

95.2 

96.8 

95.8 

95.9 

6 

97.5 

98.6 

98.0 

98.0 

7 

98.9 

99.4 

99.1 

99.1 

8 

99.5 

99.7 

99.6 

99.6 
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TABLE 5- 8 SCANNER OUTPUT CLASSIFICATION ACCURACIES VS. IFOV 
ADJACENT SAMPLE CORRELATION 0.65 


IFOV 

P | 

C|03 1 

P | 

c h 

P | 
C|W 3 

a 

P 

c 

1 

63. 1 

66.8 

73.5 

67.8 

2 

72.3 

78.6 

77.9 

76.3 

3 

80.5 

84.8 

83.8 

83.0 

4 

86.7 

90.4 

89.0 

88.7 

5 

92. 1 

94.6 

93. 1 

93.3 

6 

95.4 

96.9 

95.9 

96 . 1 

7 

97.5 

98.5 

97.7 

97.9 

8 

98.7 

99.2 

98.9 

98.9 
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TABLE 5- 9 SCANNER OUTPUT CLASSIFICATION ACCURACIES VS. IFOV 
ADJACENT SAMPLE CORRELATION©. 70 


IFOV 

£ i 

C K 

P | 

cl^ 

P i 

c l “ 3 

A 

P 

c 

1 

6173 

65T8 - 

73.6 

66.9 

2 

69.4 

75.0 

76.6 

73.7 

3 

75.5 

82.9 

81.2 

79.9 

4 

83.0 

86.7 

85.7 

85.1 

S 

87.6 

91.6 

89.8 

89.7 

6 

91.9 

94.6 

92.9 

93. 1 

7 

94.6 

96.2 

95.5 

95.4 

8 

96.7 

98.1 

97.2 

97,3 
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I 

F 

I A0 
C 
fl 
T 

I 85 


fl 0,0 

C 

C 

u 7S 

fl 

c 

Y 70 
C 

P 

C 

y S5 


CLASS 1 
CLASS 2 
CLASS 5 
OVERALL 


SCANNER IFOV IN HIGH RESOLUTION PIXELS 

FIG. 5-S SCANNER OUTPUT CLASSIFICATION RCCURACY VS. IFOV 
ADJACENT SAMPLE CORRELATION- .7 
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TABLE 5“ 10 SCANNER OUTPF.T CLASSIFICATION ACCURACIES VS. IFOV 
ADJACENT SAMPLE CORRELATIONS. 75 



IFOV 

P I 

P 1 

P t 

P 

C K 

c ' ui 2 

cja ) 3 

c 

1 

59.5 

64.2 

73.2 

65.7 

2 

66.4 

69.9 

75.7 

70.7 

3 

72.3 

78.6 

77.9 

76.3 

4 

78.0 

83.2 

81.7 

81.0 

5 

83.0 

86.7 

85.9 

85.2 

6 

87.2 

90.4 

89.2 

89.0 

7 

91.0 

93.6 

92.2 

92.2 

S 

93.3 

95.3 

94.2 

94.3 
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IFOV 

P 1 

cl^ 

>V 

P , 

c l w 2 

p , 
c |(0 3 

J"» 

P 

C 

1 

58.2 

63 * 1 

“72.5 

6476” 

2 

63.4 

66*8 

73.5 

67.9 

3 

68.5 

73,3 

76-6 

72.8 

4 

72-4 

78,6 

78.7 

76.6 

5 

77.9 

83.0 

SI. 4 

80.8 

6 

82-4 

85.6 

84.7 

84.2 

7 

84-9 

88.2 

87.1 

86.7 

8 

87.6 

91.6 

89.8 

89.7 
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IFOV 

A 

P 1 

c K 

A 

p i 

c h 

A 

P | 
C l t0 3 

A 

P 

c 

1 

56.8 

62.2 

72.5 

63.8 

2 

60.9 

65.5 

73.4 

66.6 

3 

64.4 

68.2 

74.0 

68.9 

4 

67.9 

73.1 

76.3 

72.4 

5 

71.5 

78.1 

77. 1 

75.6 

6 

74.9 

80,6 

79.5 

78.3 

7 

77.9 

83.1 

81.5 

80.8 

8 

80.7 

84.8 

83.9 

83, 1 
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TABLE 5-13 SCANNER OUTPUT CLASSIFICATION ACCURACIES VS. IFOV 
ADJACENT SAMPLE CORRELATION©. 90 


IFOV 

P | 
cl^ 

A 

P 1 

C|0) 9 

A 

P 1 

P 

c 

1 

bb.U 

t> i .z 

73:0' 

63 . 1 

2 

58.2 

63.1 

72.7 

64.7 

3 

60.8 

65.0 

73.3 

66.4 

4 

61.3 

86.5 

73.5 

67.3 

S 

65.4 

68.3 

74.7 

69.7 

6 

67.7 

71.1 

76.3 

71.7 

7 

69.4 

75.0 

76.8 

73.8 

8 

72.3 

78.6 

77.8 

76.2 
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TABLE 5-14 SCANNER OUTPUT CLASSIFICATION ACCURACIES VS. IFOV 
ADJACENT SAMPLE CORRELATION =0.95 


IFOV 

P | 
c|w 1 

P i 

C I° J 2 

A 

P | 
c[u> 3 

A 

P 

C 

1 

54.1 

60.3 

72.1 

62.1 

2 

55.0 

61.0 

73.0 

63.0 

3 

56.2 

62.2 

72.5 

63.6 

4 

57.9 

62.9 

72.7 

64.5 

5 

58.7 

63.3 

72.9 

65.0 

6 

59.7 

64.4 

73.3 

65.8 

7 

61.5 

65.6 

73.6 

66.9 

8 

61.9 

66.2 

73.5 

67.2 
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The second property universally observed is the expon- 
ential type rise of £ . precipitated by the changing 

c | a) i 

slope of the curves for a fixed and p*J. This property 
is brought about by the nonlinear weighting feature of W 
as the IFOV is varied. Let 

A 1 = W(p x' V 2 ^ 

A 2 = W(p x ,p y ,r 2 ) (5-15) 

A 3 = W(p x /p y' r 3 ) 

where r^, r 2 and r^ are three different IPOV's increasing 
order. Then, 

Ag - A 2 < A^ - A^ (5-16) 

Therefore, the classification accuracy improvement must 

necessarily taper off as IFOV increases. This last property 

is probably best demonstrated in Fig. 5-14 where the input 

process has a high degree of spatial correlation. The 

plots of P | vs. IFOV are nearly flat with an overall 
c i 

classification improvement of 5.1%. This compares with 

13.2% for p =0.9, 25.1% for p =0.8 and 26.7% for p =0.5. 
x x x 

For a degenerate case where P x = Py = l» the characteristic 
function indicates that input and output classification 
accuracies are identical. This of course is predictable 
since total spatial correlation is tantamount to a process 
with only a DC value. 
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5.3.2 Classification Accuracies at the MSS Output: 
Additive Gaussian Noise 

In this subsection the definitions and conventions 
adopted in sec. 3.3 will be adhered to throughout. In 
order to study the effect of additive random noise in the 
class if iability of remotely sensed data, the scanner 
output class conditional statistics undergo the following 
linear transformation 



where E is the noise free output statistics and E„ is 
— g —N 

the covariance matrix of a white noise process and as 
such it is also diagonal. The SNR in this case is 
defined on a class conditioned basis. However, the classes 
in the test case all have equal channel variances with 
equal spatial correlation parameters, therefore, the 
class conditional SNR is identical for all three populations 
A fixed spatial correlation model with p^^O.85 and p^=0.79 
is chosen and the output probability of correct classifation 
vs. IPOV is estimated for SNR = 10, 20 and 30 dB. The noise 
enters the system at the MSS output and models the quanti- 
zation and detector noise. Pig. 5-15 through 5-17 and 
Tables 5-15 through 5-17 show the interaction of noise 
and scanner IPOV and their effects on the output 
classification accuracy. Fig. 5-18 shows the 
dependence of P on IPOV with SNR as a running parameter. 
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Note that a fixed output SNR implies a variable noise 
power environment. 

The functional variation of the classification 
accuracies vs. IFOV is essentially identical for dif- 
ferent noise levels. P c | w . increases monotonically with 
increasing IFOV for a fixed SNR. Compared to the noise 

free case of Fig. 5-12, the slopes of § i in the noise 

c | ok 

added case are relatively close. The classification 

accuracies, Pi , £ . , P , and P increased 23.9%, 

c | c | m 2 c | o >3 c 

22.6%, 11.4% and 19.3% respectively where the corresponding 
numbers for SNR = 10 dB are 20.7%, 19.9%, 14.3% and 18.3% 
as IFOV ranged from 1 to 8 pixels. The percent improvement 
of the output classification accuracy vs. IFOV therefore 
is not heavily dependent on the output SNR in this case. 

The deterioration of the classification accuracies as noise 
power is increased is greater for larger scanner IFOV's. 
This is due to the fact that the coarse resolution output 
with a smaller variance is more susceptible to random 
disturbances than a process that already has an appreciable 
variance. This property is illustrated in Fig. 5-18 
where the SNR = 10 dB curve diverges from the rest of the 
plots for higher IFOV's. 

The tradeoff between the SNR and IFOV is also illus- 
trated in Fig. 5-18 by observing that £ is multiple 

c 

valued, i.e, I: he combination of SNR and IFOV that result 

in a particular P is not unique. In the case under study 

c 
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TABLE 


;-15 SCANNER OUTPUT CLASSIFICATION ACCURACIES VS. IFOV 
SNR= 10 DB ADJACENT SAMPLE CORRELATIONS. 85 


IFOV 

P . 

C K 

P | 

c l“2 

A 

P 1 
c 1 w 3 


1 

53.5 

59.3 

57.8 

56.9 

2 

56. 1 

64.6 

59.4 

60.0 

3 

58.7 

66.3 

60.3 

61.8 

4 

60.9 

68.3 

62.4 

63.9 

5 

66.8 

71.2 

64.5 

67.5 

6 

70.5 

74.1 

65.6 

70.1 

7 

72.5 

76.8 

70.0 

73.1 

8 

74.2 

79.2 

72.2 

75.2 
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SCANNER I FOV IN HIGH RESOLUTION PIXELS 


FIG. S-1S CLASSIFICATION ACCURACIES AT THE SCANNER OUTPUT VS 
ADDITIVE GAUSSIAN NOISE AND IFOV 
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TABLE 5-16 SCANNER OUTPUT CLASSIFICATION ACCURACIES VS. IFOV 
SNR=20 DB ADJACENT SAMPLE C0RRELATI0N=8.85 


IFOV 

P ■ 

C I Id.. 

P | 
c|(02 

P i 
c ' w 3 

A 

P 

C 

~T 

36. 1 

63.0 

70 . y 

63.4 

2 

58.9 

65.7 

71.0 

65.2 

3 

62.1 

67.7 

73.3 

67.7 

4 

66.7 

70.6 

75.1 

70.8 

5 

70.9 

75.1 

76.6 

74.2 

6 

74.3 

80. 2 

78.5 

77,7 

7 

77.0 

83.3 

79.7 

80.0 

8 

79.9 

84.9 

82.1 

82.3 






TABLE 5-17 SCANNER OUTPUT CLASSIFICATION ACCURACIES VS. IFOV 
SNR=30 BB ADJACENT SAMPLE CORRELATION©. 85 


IFOV 


A 

P i 
C >“2 

Pi 
C > w 3 

A 

p 

c 

1 

56.2 

63.5 

71.7 

63.8 

2 

60.2 

65.7 

73.0 

66.3 

3 

62.9 

67.9 

75.5 

68.8 

4 

68.3 

70.8 

76.3 

71.8 

5 

71.5 

76.2 

77.8 

75.2 

6 

74.4 

80.3 

79.5 

78.1 

7 

77. 1 

83.5 

81.6 

80.7 

8 

80.2 

85.1 

83.4 

82.9 
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SCANNER IFOV IN HIGH RESOLUTION PIXELS 


5-17 CLASSIFICATION ACCURACIES AT THE SCANNER OUTPUT VS 
ADDITIVE GAUSSIAN NOISE AND IFOV 
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a 70S classification accuracy can be achieved when IFOV = 6 
pixels, SNR — 10 dB or IFOV =3.8 pixels, SNR=20dB or 
IFOV = 3.5 pixels and SNR=30dB. Equivalent if the 
system noise level is such that the SNR is at a low 10 dB , 
to achieve a prescribed minimum classification performance. 
The resulting data spatial resolution will suffer. The same 
classification accuracy can be obtained with a 60% 
improvement in spatial resolution if the MSS is operating 
at a 30 dB SNR. 


5.4 Summary and Conclusions 
The objective of this chapter was to employ the CSP 
error estimation technique and MSS model in an integrated 
parametric package that would produce the theoretical 
response of the MSS in a fully controllable environment. 

The results presented are not intended to be exhaustive 
but rather to demonstrate the method and to illustrate 
general trends in the system response. It is constructive 
to compare the patterns observed with those obtained by 
other nimulation techniques. 

A parallel study aimed at the same objectives is 
reported in [3]. High resolution (6 m) aircraft MSS data 
was considered with a cascade of simulated scanner PSF's 
to produce data sets with 30 m, 40 m, 50 m and 60 m ground 
resolutions and the classification performance was estimated 
for each case. The results provided less than conclusive 
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1 


evidence on the monotonic realtionship between classifica- 
tion performance and the IFOV due to the very small rise 
in P c as IFOV was enlarged. This conclusion can be 
fully understood from the theoretical curves of P c vs. 

IFOV. The significant parameter, data spatial correlations, 
is what determins how strongly classification performance 
and IFOV are interrelated. As for a real data set, its 
spatial correlation structure is a fixed parameter . In 
case of high resolution aircraft data, pixel-to-pixel 
correlation can be as high as 0.9 or 0.95. Fig. 5-13 
and 5-14 with p =0.9 or 0.95 respectively clearly illus- 
trate that P^ and IFOV are inded weakly coupled. Had 
the data under investigation in [3] been less spatially 
correlated, this coupling would manifest itself more 
strongly. For satellite data having a of about 

0.75-0.8, P shows considerably stronger sensitivity to 
c 

variations of IFOV. 

The following conclusions emerge from the theoretical 
simulation of MSS spatial characteristics. 

1. The achievable classification performance monoton- 
ically increases with increasing IFOV, at the 
expense of spatial resolution. 

2. The degree of such dependence is directly related 
to the extent of spatial correlation of the random 
processes at the scanner input. M process with 

a DC value alone will have identical classification 
performance at the MSS input and output regardless 
of IFOV. 


j 

i 



t 
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3. Additive noise, by reducing the class separabilities 

produces a degradation of the classification per- 
formance. For any fixed SNR, however, P c still 
increases with increasing XFOV. ^ 

4 . When a minimum classification performance is a 
design parameter. Fig. 5-18 determines the required 
operating states. For the test case under study, 
given that min{P c ) =70%, the lower bounds on IFOV 
are 6, 3.75 and 3 low resolution pixels for SNR, 

10 dB, 20 dB and 30 dB respectively. 
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CHAPTER 6 

Conclusions and Observations 

In this Chapter we provide a broad evaluation of the re- 
sults of the study and the degree which it has satisfied 
the objectives put forth initially. The performance of 
CSP Bayes error estimator was, by far, the most signifi- 
cant result. The transformation of ideas from abstract to 
practical more often than not is limited by the finiteness 
of the available resources; hence, it is often irrelevant 
whether a method is theoretically sound. In this case with 
the exponential rise in the number of sampling cells due 
to the dimensional effect, a requirement for more cells 
per axis would have put the usefulness of the algorithm 
in grave doubt. That this was not to be the case has been 
amply demonstrated in the experimental results of chapter 4. 
Admittedly the feature spaces considered cannot be 
classified as being of high dimensionality but within the 
scope of the present and near future MSS data gathered by 
satellites will consist of four or five bands of visible 
and infrared radiation. In fact even that may be reduced 
if some of the bands prove to be redundant in the prepro- 
cessing stages of data analysis. The systematic behavior 
of the estimate vs. grid size is a characteristic that 
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provides some degree of a posteriori information. Knowing 
that the estimator almost universally approaches the 
Bayes error with a decreasing negative bias and the fact 
that at about 8 cells per axis the estimate is within 1% of 
its final value, one may select a small grid size and choose 
to project the final value by heuristic or other numerical 
techniques. This approach may be useful when the data is 
of unusually high dimensionality. 

There are undoubtedly a number of refinements that 
could accelerate the rate of convergence even further. It 
has been mentioned frequently that the boundary cells are 
the primary source of the estimation error. By adopting 
a larger grid, the measurement space is divided into finer 
partitions indiscriminantly. The optimum strategy should 
sample the interior of r. as coarsely as possible and the 
boundary rergion as finely as possible. One such technique 
is to first 1 detect* the boundary by a coarse grid and then 
perform the partitioning by working around that segment 
while leaving the interior grid intact. In implementing this 
modification, however, close attention should be paid to 
the theoretical convergence property of the modified esti- 
mator. The choice of sampling grids other than binomial 
may be considered although one such grid with variable 
cell size was employed with no discernible improvement in 
performance. 
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The evaluation of the scanner spatial model provided 
the first application of the CSP error estimator. Compared 
to simulation techniques using a large data base, manipu- 
lation of the MSS parameters proved to be much simpler. 

The problem was simplified somewhat by the availability of 
closed form relationships governing the input-output statis- 
tical dependencies. This was possible because of the par- 
ticular approximating function and for the scanner's PSF. 

The spatial characteristic function is by no means bound 
by such an assumption. The technique employed in Appendix A 
can be carried out for any specified PSF in which case the 
results in general are not in closed form. The observed 
response of the MSS was in close agreement with the reported 
results based on Monte-Carlo techniques. The primary 
difference was a far greater flexibility provided by the 
scanner model in examining the response to various parameter 
manipulations . 

The number and kind of potential applications of the 
analysis package developed here are far greater than 
there was space to elaborate. The spatial model can be 
expanded to include a greater range of noise levels and 
sources. It is possible to accumulate a catalogue of 
system response curves corresponding to combinations of 
different scanner and ground scene parameters. A set of 
desired system parameters can be specif ied and the remaining 
set determined from the theoretical response characteristics. 
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The availability of several different sets of parameters 
to achieve a certain performance index underlines the inher 


ent tradeoffs in designing a multispectral scanner system. 
The fundamental function of this parametric package, there 
fore, is to provide for an easily impiementable technique 


to evaluate the system's performance with maximum 
bility and minimum input information. 


flexi- 
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Appendix A 

Multispectral Scanner Output Statistics 


In order to determine the effects of different scanner 
IFOV's and their interaction with classification accuracy 
of a data set, it is essential that the required output 
covariance matrices be parametrically represented in 
terms of known input quantities. In sec. 3.2.2 it was 
noted that the entire spectral covariance matrix is speci- 
fied if the approporiate spatial correlation functions 
are known. Let f(x,y), g(x,y) and h(x,y) denote the input 
and output random processes associated v/ith any two matching 
bands and the scanner PSF, respectively. It is well known 
that the above quantities are related by a convolution 
integral . 


gU,y) 


jj f (x-A 1 y-X 2 )h , a 2 ) dA 1 dA 2 


(A-l) 


In order to derive specific results, two different scanner 
PSF's are considered: (a) a spherically symmetric Gaussian 

PSF; and (b) a rectangular PSF. The spatial correlation 
matrix describing the scene is a two sided exponential. 
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A. 1 Gaussian Scanner PSF 
The PSF and spatial correlation model are given by 


f n) 

h (x,y) = 





e r o 2 


(A- 2) 


where p q = e a is the adjacent pixel correlation assumed 
equal along the horizontal and vertical directions. This 
assumption is not in contradition with the fact that in a 
digital data set sample - to -sample correlation is higher 
than line-to-line correlation because of the closer 
physical distance between the samples. In continuous 
domain , such as this formulation, where theoretically 
equally spaced lines and columns can exist, there is little 
reason for assuming different pixel-to-pixel correlation 
along each direction. Two quantities, c^ and r^ specify 
the PSF where c^ is a normalizing constant providing unity 
gain and r^ is the filter's characteristic length, closely 
related to the IFOV. 

With the parameters of the problem defined, the scanner 
output correlation function can be expressed as; 

S gg (u,v) = S ff (u,v) |H(u,v) | 2 (A-3 ) 

2 

where S(u,v) is spectral density. Let fl(u,v) = |h(u,v)| , 
then 


sssjsss 


259 


R__ ( x , ti ) = R-< = (tfn)*m('r f n) 

gg f£ 

2 2 
22 - T y 

irc/r „ 2 2 

, * 1 o _2r^ 2ro 

m(T,n) = 2 e ° 


(A- 5) 


Using the separability property of the functions involved. 


R ( X 

gg 


,n) = R ff (x-x)m(x)dx |R ff (n-y)m(y)dy 


= AxB 


(A- 6) 


S? ax 

yr J— 03 


/rTc.r fx , , - — — =■ 

: — L_° I e -a(t-x) e 2r 0 2 dx + 


(A-7) 


/irc_r r® 


l_o e a(x 

2 ^X 


" X) e 2r o 2 dx 


Combining the exponentials and completing the squares, 

- T a V (x-ar 2 ) 

/ttc_ r — ~ ax rx- — ^ — 

A = — ^-2. e e 2r o~ ix + 


2 2 
ar o 


(x+ar 2 ) 2 
o ' 


dxj (A- 8) 
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The individual integrals can now be represented in terms of 
the Q function 


2 2 
a r 

o 


T-ar 


- a T 


A = ire, r 
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dx + 


2 2 
ar o 


+ax 
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2 2 
ar o 


= ttctT Le 
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2 

- a-: ar -t — 
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2 
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(A— 9 ) 


The constant c^ is the solution to the following equation 


2 

X 

f- «-> — 


- 

2 

_ _y_ 

| /c7 e r o 2 dx 


J 

/c7 e r 0“ dy 
1 


Therefore, 


C 1 = 


irr 


Noting that B is similarly evaluated, the spatial correla- 
tion function at the scanner output is given by 


R _ ( t , n ) = 
99 


2 2 2 2 
a r a r 

-y^-ax -y^ + ax 

i * Q ( ar Q “ + e Q{ar + ~) 

o o J 


{A-ll ) 
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The above relationship can be easily modified to cover 
the case of unequal pixel- to-pixel correlation along cross 
track and down track directions. If R^^(x,n) is given by 

R ff (T,„> = e _a l T I e" b 1 11 i 


Then it follows that 


R gg (T ' n) = L e 


a 2 r 2 a 2 r 2 

2^ aT t -2 2 - + at 

Q(ar o -~) + e 
o 


Q (ar + — ) x 


•° r o J 


a 2 .2 2 _ 

b r br 

— 5 br| — = J- bn 

s Q(br 0 -^-)+e ^ Q(br 0 +^-)| 

O o ^ 


(A-12) 


Note that since the input process f(x,y) has a unity vari- 
ance R (0,0) is in effect a weighting by which any input 
variance will be multiplied to produce the corresponding 
output spectral variance. The right hand side of (A-9) , 
therefore, can be considered as a weighting function associ- 
ated with any multiband scanner to relate input and output 

statistics. Denote this function by W (T,n,a,b). 

a 

The next item of interest is the output crosscorrela- 
tion among channels. This quantity, designated by R (t,ri), 

9i9 j 

is a straight forward extension of the method just de- 


scribed. Again assuming a Markov or exponential structure 
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governing the crosscorrelation function between channels 


-a . • | t | -b . . | n 

R f f (T,n> = r f f o f _a f e ^ e 13 
x 3 1313 


(A-13) 


where r. . is the spectral crosscorrelation coefficient at 

13 

the input such that |r- - j i 1 . a. . and b. . are defined 

i 3 " L3 '^“ 3 

similar to a and b. Since the crosscorrelation function 
between a pair of outputs of a linear system is related to 
the input cross correlation of the same pair in a form 
similar to (A- 3) [166] , i.e. 


S (u,v) « S f f (u,v) | h(u,v) 
g i y i x i 


(A-14 ) 


the same technique used previously provides the acrossband 
correlation function at the MSS output. 


R a (t,h) = r _ - a- a- W (x,n »a. . ,b . . ) (A-15) 

y i y j r i j r i 3 s 13 13 


From (A-13) , the crosscorrelation coefficient between any 
two channels at the scanner output is 


R „ (0,0) 


'g.g. a cr 

y x 3 g ± 


(A-16) 
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Prom (A-15) 
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3 i 3 j 


( 0 , 0 ) 


'£. f . 
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(A-17 ) 


also 




VV (0,0,a. .,b. .) 

S XX XX 


(A-18) 


hence 


r Ws (0 f 0 y a-j j ,b-j j ) r 

9ig 3, W^(0,0 r a ii ,b ii ) W^(0,0,a j;j ,b j; .) * 1 (A-19) 


Therefore, the band-to-band correlation coefficients 

are identical at scanner input and output provided 

spatial auto and crosscorrelation functions at the 

input are equivalent, i.e., a.. = a.., b.. =b... 

xx xj xx lj 
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A. 2 Rectangular Scanner PSF 
A rectangular point spread function is defined here 


by 


h<x,y)H 


V r rf Ullyls-# 


(A-20) 


otherwise 


Similarly to (A-4 ) ; 

Rgg(t f n) = n) *m(x ,ri) 

where 


m(T,n) = | t | , | n | - r (A-21) 


Emplying (A-6) to (A-21) 
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_ _i_ [" e -a l T ~ x 
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(l- J — L) ,j x 

r 
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fO 


-a (t-x) /n , x * , 
e (1 + — ) dx + 

r 

r o . ° 


" e- a(t - x) (l-i) dx 
■to r o 


' r ° c a(t - x) (l-i)d^ 

T r O 


(A-22) 


Designating the three terms in the bracket as I, II and III, 
routine integration techniques yield the following 
results 
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1 „ 1 - e x „-a t 

I - (1 « )e 

ar o 
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(A-23a) 
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11 = a “-TTTT ) e 

ar Q 


{A- 2 3b) 


-ar | | 

hi = — > - < x - Http— - ,) e ’ a T (a ‘ 23c> 

ar r o 
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A similar expression is obtained for B by substituting 

n and b for t and a in (A- 2 3a) thru (A- 2 3c) . The scanner 

characteristic function, W (0,0,a,b) is evaluated by 

s 

equating t = n = 0 in I f II and III; 


-ar 

2 i — e ° 

w (0,0, a,b) = 

ar o 
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) X 


2 1 - e 

— - — (i _ - r 

, U br 
br o 

o 


(A-24) 


(A-24) is plotted in sec. 3.2.2 for different values of 


a, b and r . 

o 


Appendix B 

Exponential Spatial Correlation Function Simulator 


As a part of utilizing a completely simulated data 
base, access to one with a Markov- structured spatial 
correlation is required. In order to obtain such a set a 
white noise process can be transmitted through an 
appropriate filter. 

Let f{x,y), g{x,y) and h(x,y) be the input, the output 
and the desired filter, then 

S gg (u,v) » S ff (u,v) |H(u,v) ] 2 (B— 1) 

Since S^^(u,v) = 1 for white noise and the desired spectral 

density function, S (u,v) = - ~ a 2 , therefore 

a + u b + v 


H(»,v)H*(u,v) (B-2) 

It then follows that the desired PSF is a one sided 
exponential i.e. 


h(x,y) = ce” ax e by (B-3) 

1 1 

let r x = — an & r y = b the filters characteristic length 
along x and y directions. Then 


PASS 


'ffSSSB* 
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JL _ _y_ 

h (x,y) = ce rx e r Y x,y > 0 (B-4) 

where c is a normalizing constant providing unity filter 
gain. Since this filter operates exclusively on digital 
data, formulation of the problem will be entirely in 
the discrete domain. Let the filter's length, in pixels, 
be N q . In order to solve for c, the following should 
hold 

N-l N -1 _ _m_ _ _n_ 

£ £ h(m,n) = £ £ ce rx e r Y = 1 

m=o n=o m n 


m _ n 

r x> /V _ _ r. 


= (I rx ) (£ c 2 e r Y) 

m m 


(B-5) 


By equating the individual terms to 1, unity gain will exist 
along the individual axis as well. 


N-l _ _m_ _1_ _ _ N-X 

? c e r x = c x (1 + e rx + e r x + . . . + e rx ) 


N-l 


m=o 

_ _1_ 

let p x = e rx , the adjacent sample correlation, then 

N 

N ; 1 - p x-i 

m=o v x 


therefore 


p ~i 

^1 

M x 


(B-6) 


b~V 
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similarly, defining — e r y as the adjacent line corre- 

lation , 


’ 2 p^-i 


(B-7) 


but 


then 


c " C 1 C 2 


c = { 


Py ^ 




N n 

Py°-1 


) 


(B-8) 


Since the exponential filter in addition to generating 
a pixel-to-pixel correlation alters the statistical proper- 
ties of the input as well, a knowledge of that effect is 
necessary. Let the input to this filter consist of N 
random processes corresponding to N spectral bands . The 
input and output are related by the following discrete con- 
volution; 

N -1 N -1 
o o 

g(m,n) = \ \ f (i+m, j+n)W(i, j) (B-9a) 

i=o j=o 

where 

W(i, j) = h(N -l-i,N -1-j) (B-9b) 

o o 

and no subscript on g(m,n) and f(m,n) designates any two 
corresponding bands. The output spatial correlation is 
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then given by 

R ( t , n ) = E{g (m+x ,n+n) g(m f n)} 
yy 

N 0 -l Nq- 1 N-l Nq-1 

={ I I I I f (i+m+x , j+n+n) x 
i=o j=o k=o £=o 

f (k+m, «.+n)W(i, j)W(k, £) } 

Moving the expectation inside; 


E( f (i+m+x , j+n+n) f(k+m, JL+n) } = R ff (i - k+x , j-£+n) 
Since f(x,y) is a white noise process: 


R ff (t,h) 


Therefore , 
fied 



0 TfH j 0 

2 - n 

Cf T,n - o 

(x,n) is non-zero if the following is 


substituting 


R_ _ ( t , n ) 

gg 


i-k+x = 0 > i = k-x 

j-fc+n = 0 — + j - &-n 
(B-13) in (B-10) 

N -1 Nq-1 

= a? f l W (k, S.) W (k-x , £-n) 
k=2. J.=n 



(B-10) 


(B— 11) 


(B-12) 
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(B-13) 
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Prom (B-9b) 


k- (N-l) -(N-l) 

o o 


W(k, S.) = c e 


■x 


r y 


= w(o,o) P ^ k Py £ 


(B-15a) 


where 


N-l N-l 
o o 


W (0 , 0) = ce r x e r Y 


(B-15b) 


H ( t , r| ) 

gg 


= W (0,0) 
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t n ( ~2t p x 
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(B-16) 


The variance of the output process is therefore given by 


2 2 /'°; 2N ° ■ w 

=« 2 (0,0^-V- 


p y 2N ° ■ 1N ) 2 

y 


(B-17) 


This result approaches the continuous version for large N q / 
r and r i.e. o o >> 1 and p - 1 = — and similarly for 

X y X X 10 

* X 
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p . Under these conditions 

y 


a 2 / a 2 « 1/4 r r 

g f x y 


(B-18) 


Since the primary purpose of this filter is the generation 
of a pixel-to-pixel correlation of some prescribed value, 
the following should hold 


R gg ( 1 ' n) “ Vf 


(B-19) 


R (t ,1) 

gg 


= p <j 

y g 


(B-20) 


let x = 1 and n =0 in (B-16) . Using the approximation 
p^ 2 ^ N o~D - 1 e p ^ D it immediately follows that 


R (l,n) -poj. 

gg x f 


(B-21) 


and similarly for R„ (t,1). 

gg 

The next topic is the across-band statistical and spa- 
tial effects the exponential filter might have had on the 
multispectral white noise. Following an exact analog of 
the derivation presented so far, the crosscorrelation 
function for any two bands at the output, g^ and is given 


Vg. h ' n) =w2(0 '° )p x Tp y I1 ( P ~-2“~ 

\ p x - 1 


-2(N 0 -t)_ 1 v S -2 (w 0 -n) 


-2 1 

p y _1 


r. . a a 
f. f. 
i 3 


(B-22) 


i 


where the input crosscorrelation function is defined 


t , n ) 

f i 1 


r o 


x , n = 0 



f . 
1 


t ,n 


0 


(B-23) 


The band-to-band correlation coefficient at the output is 
given by 


r 

g 


• g ■ 

ii 


H _ (0,0)/(X a 
ct * g . g . g . 


It then immediately follows that 


(B-24) 


g . g ■ 

3 


= r 


f i f j 


(B-25) 


i.e. the correlation coefficients has undergone no change 
under this transformation 
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